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The present invention relates to at least partially purified 
protein, capable of modulating the DNA replication in plants , 
muteins thereof, DNA coding therefor and to a method to confer 
to one or more plant cells the capacity to provide such a 

10 protein or mutein. The invention also relates to plants, 
comprising the said DNA and the progeny thereof . 

The regulation of the cell cycle in plants . is poorly 
understood. Most of the knowledge regarding the regulation of 
DNA replication, also known as the S-phase of the. cell cycle 

15 regulation originates from experimental data obtained in yeast 
and mammalian cells. However, the importance to understand the 
cell cycle regulation in plant cells has become increasingly 
important in agriculture, e.g. to control growth of plants at 
stress conditions, to obtain resistance against parasites that 

20 block or modulate the cell cycle regulation, or to improve the 
yield of agriculturally important crops. Further, one might be 
interested to intervene in the cell cycle regulation by 
allowing further rounds of DNA replication, but simultaneously 
preventing further cell cycle progress by blocking the 

25 subsequent mitosis. In this way, cells may be obtained having 
multiple sets of their genetic material, so that plants with 
a high rate of endoreduplicat ion may be generated. The term 
"endoreduplication" means recurrent DNA replication without 
consequent mitosis and cytokinesis. 

3 0 From experiments in yeast, it is known that DNA 

replication and mitosis are coupled events in the cell cycle. 
Paulovich et al . , 1997; Cell 88, 315-321. Genetic studies in 
yeast for example suggest that the CDC7 serine- threonine kinase 
plays a role in the initiation of DNA synthesis. Evidence has 

3 5 been presented that CDC7 is apparently directly involved in the 
activation of individual early- as well as late replication 
origins during S-phase (Bousset and Diffley, 1998, Genes Dev 
12, 480-490; Donaldson et al., 1998, Genes Dev 12, 491-501). 
The protein levels of CDC 7 are constant during the cell cycle. 

40 Activation of CDC7 as a kinase occurs at the Gl/S transition 
of the cell cycle and is dependent on the binding with another 
factor, DBF4, at the Gl/S transition of the cell cycle, 



probably by phosphorylating proteins at the origins (Kitada 
et al, 1992; Genetics 131: 21-29, Lei et al; Genes and 
Development 11, 3365-3374, 1997) . In order to function as a 
kinase, the CDC7 kinase may be a substrate for one or more 
5 phosphorylation events. Overexpressed kinase-negative mutants 
of CDC7 arrest yeast cells in the Gl to S transition and 
inhibit growth. Further experiments showed that the 

inactivation of wild-type CDC 7 function probably can be 
explained through titration of DBF4 by the inactive cdc7 mutant 
10 proteins (Ohtoshi et al., 1997, Mol Gen Genet 254, 562-570). 
In addition to mechanisms to control the onset of DNA 
replication, other mechanisms contribute to restrict DNA 
replication to occur only once during the cell cycle. For 
example, the CDC16 , CDC23 and CDC27 proteins are part of a high 
15 molecular weight complex, known as the anaphase promoting 
complex (APC) or cyclosome, (see Romanowski and Madine, Trends 
in Cell Biology 6, 184-188, 1996, and Wuarin and Nurse, Cell 
85, 785-787 (1996) , both incorporated herein by reference) . The 
complex in yeast is composed of at least 8 proteins, the TPR 
20 (tetratricopeptide repeat) containing proteins CDC16 , CDC23 and 
CDC2 7 , and five other subunits named APC1, APC2 , APC4 , APC5 and 
APC7 (Peters et al . 1996, Science 274, 1199-1201). The APC 
targets its substrates for proteolytic degradation by 
catalyzing the ligation of ubiquitin molecules to these 
25 substrates. APC-dependent proteolysis is required for the 
separation of the sister chromatids at meta- to anaphase 
transition and for the final exit from mitosis. Among the APC- 
substrates are the anaphase inhibitor protein Pdslp and mitotic 
cyclins such as cyclin B, respectively (Ciosk et al. 1998, Cell 
30 93, 1067-1076; Cohen-Fix et al . 1996, Genes Dev 10, 3081-3093; 
Sudakin et al . 1995, Mol Biol Cell 6, 185-198; Jorgensen et al . 
1998, Mol Cell Biol 18, 468-476; Townsley and Ruderman 1998, 
Trends Cell Biol 8, 238-244) . To become active as a ubiquitin- 
ligase, at least CDC16 , CDC23 and CDC27 need to be 
3 5 phosphorylated in the M-phase (Ollendorf and Donoghue 1997, J 
Biol Chem 272, 32011-32018) . Activated APC persists throughout 
Gl of the subsequent cell cycle to prevent premature appearance 
of B-type cyclins which would result in an uncontrolled entry 
into S-phase (Irniger and Nasmyth 1997, J Cell Sci 110, 1523- 
0 1531) . It has been demonstrated in yeast that mutations in 
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either of at least two of the APC components, CDC16 and CDC2 7, 
can result in DNA overreplication without intervening passages 
through M-phases (Heichman and Roberts 1996, Cell 85, 39-48) . 
CDC16,CDC23 and CDC27 all are tetratricopept ide repeat (TPR) 
5 containing proteins . A suggested minimal consensus sequence of 
the TPR motif is as follows: X 3 -W-X 2 -L-G-X 2 -Y-X 8 -A-X 3 - F-X 2 -A-X 4 - 
P-X 2 (Lamb et al . 1994, EMBO J 13, 4321-4328; X denotes any 
amino acid, a stretch of n of such amino acids) . However, 
the consensus residues can exhibit significant degeneracy and 

10 little or no homology is present in non-consensus residues. 
The hydrophobicity and size of the consensus residues, rather 
than their identity, seems to be important. TPR motifs are 
present in a wide variety of proteins functional in yeast and 
higher eukaryotes in mitosis (including the APC protein 

15 components CDC16, CDC23 and CDC27) , transcription, splicing, 
protein import and neurogenesis (Goebl and Yanagida 1991, 
Trends Biochem Sci 16, 173-177) . The TPR forms a a-helical 
structure, tandem repeats organize into a superhelical 
structure ideally suited as interfaces for protein recognition 

20 (Groves and Barford 1999, Curr Opin Struct Biol 9, 383-389) . 
Within the a-helix, two amphipathic domains are usually 
present, one at the NH 2 - terminus and the other near the COOH- 
terminus (Sikorski et al . 1990, Cell 60 ,307-317). 

In order to understand the mechanisms playing a. role in 

25 plant cell cycle regulation, in particular the DNA replication, 
and to understand endoreduplicat ion in plants, the present 
inventors isolated several novel plant DNA sequences, coding 
for novel proteins, or novel amino acid sequences thereof 
involved in the modulation of DNA replication, using 

3 0 degenerated PCR primers based on known genomic or cDNA 

sequences, e.g. of yeast, mammals and insects. 

"Capable of modulating the DNA replication in plants" is 
to be understood as the capacity of a protein to alter the 
natural DNA replication mechanism in the said plant, e.g. by 
35 up- or down- regulation of the DNA replication in a way, 
different from the natural situation, or to a higher or lower 
extent with respect to the natural situation. The natural 
situation is to be understood as the situation wherein DNA 
replication takes place in plants, in which the DNA replication 

4 0 machinery is not affected by the introduction of foreign 
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genetic material. Such altering includes mediating e.g. the 
onset of DNA replication, the rate and extent of DNA 
replication, the timing of DNA replication in the cell cycle, 
coupling or uncoupling DNA replication with/from actual' 
5 subsequent cell division etcetera. 

Proteins 

By using degenerated oligonucleotides as amplification 
primers, based on conserved sequence regions of the CDC7 

10 homologue gene of Sac char omyces cerevisiae and 
Schizosaccharomyces pombe and on conserved sequence regions of 
the CDC27 homologue genes of Schizosaccharomyces pombe and from 
Aspergillus Nidulans, drosophila and human, the present 
inventors surprisingly found such novel proteins and amino acid 

15 sequences. Reference is made to the examples. 

Thus, novel cDNAs and proteins comprising one or more 
novel amino acid sequences were found. The present invention 
therefore relates in the first place to an at least partially 
purified protein, capable of modulating DNA replication in 

2 0 plants, at least comprising in the amino acid sequence 

a) one or more of the amino acid sequences chosen 
from the group consisting of those, given by SEQ ID NOS 
2,3 and 4 , 

b) . one or more of the amino acid sequences chosen 
25 from the group consisting of those, given by SEQ ID NOS 

6 and 7, 

c) one or more amino acid sequences having at 
least 50% amino acid identity with those of a) , or 

d) one or more amino acid sequences having at 
30 least 50% amino acid identity with those of b) . 

By using degenerated CDC7 oligonucleotides to amplify a 
PCR fragment as is indicated above and will be further detailed 
in the examples, a novel Arabidopsis cDNA comprising coding 
sequence of an novel Arabidopsis CDC 7 homologue gene was found 

3 5 (SEQ ID NO 8) . By comparison of the said sequences with 

sequences of the EMBL and EMBLnew databanks, a genomic 
Arabidopsis thaliana sequence was found (accession number 
297342) . In this known genomic sequence however, only 11 exons 
were identified. The novel DNA according to the present 

4 0 invention however clearly indicated the presence of 3 



additional coding sequences coding for novel amino acid 
sequences (SEQ ID NO 2, 3, 4) being part of a DNA replication 
modulating plant protein, homologous to yeast CDC7 . 

The novel amino acid sequence SEQ ID No 2 
(GYGIVYKATRKTDGTEFAIK) is located in two highly conserved 
domains in protein kinases, Domain I and II (Hawks et al . , 
1988, Science 241, 42-52) . The sequence GYGIV is part of the 
nucleotide (ATP) binding domain, also known as Domain I in 
protein kinases. Domain I is part of the catalytic domain of 
protein kinases. The Glycines (G) are believed to form an elbow 
around the nucleotide, and the Valine (V) is believed to 
contribute to positioning of the Glycines. The first Glycine 
and the Valine are invariant in all protein kinases. The second 
Glycine is almost invariant. 

The sequence AIK in the same peptide is also highly 
conserved and it is located in Domain II, which is also part 
of the catalytic domain. The Alanine (A) and the Lysine (K) are 
invariant in all kinases, and the Isoleucine is highly 
conserved. The Lysine residue appears to be involved in 
mediating the phosphotransf er reaction (Hawks et al, 1988) . 

This exon is responsible for the kinase activity of CDC 
7. This implies that the CDC 7 coding sequence from the state 
of the art is not functional. 

The novel exon encoded by amino acid sequence SEQ ID No 
3 (DVIEKKDGPCSGTKGFRAPE) is part of Domain VIII of protein 
kinases. Mutagenesis has implicated a role of this domain in 
the catalytic activity (Hawks et al . , 1988). In the sequence 
TKGFRAPE, the amino acids Threonine (T) , Phenylalanine and 
Alanine (A) are highly conserved, and the Glutamic Acid (E) is 
invariant. Moreover, substitution of the corresponding 
threonine in the yeast CDC7 homologue (position 281 of the 
yeast CDC 7 ; position 722 in SEQ ID No 1) to a glutamate 
resulted in a dominant -negative CDC7mutant (Ohtoshi et al . 
1997, Mol Gen Genet 254, 562-570) . 

The novel exon, encoded by amino acid sequences SEQ ID 
No 4 ( N I KD I AQLRG S EELWE VAKLHNRE S S F P K ) is located in Domain XI 
of protein kinases, and that in the peptide, the first Leucine 
(L) , and the second Lysine (K) are highly conserved and 
therefore are believed to be quite important for the correct 
activity of the protein. 



In addition, using degenerated CDC27 oligonucleotides, 
an Arabidopsis thaliana cDNA sequence was found, which upon 
comparison in the above mentioned databanks, showed high, 
homology with an Arabidopsis thaliana. genomic DNA sequence 
5 (accession number AC 001645). Again, the coding sequence ( SEQ 
ID NO 9) , found by the present inventors, indicated the 
presence of two additional coding regions in the Arabidopsis 
CDC27, the gene, corresponding with the amino acid sequences 
given by SEQ ID NOS 6 and 7. Thus, novel DNA replication 
0 modulating proteins in plants were found, comprising one or 
more of the above mentioned novel amino acid sequences. 

The novel exon encoded by amino acid sequence SEQ ID No 
6 ( VNLQLLiARC YL SNQ AYS AY Y ILK) is part of a unique NH 2 -terminal 
domain conserved in CDC2 7 homologues of different origin. The 
5 unique domain is located upstream of the NH 2 - terminal TPR unit 
of CDC27 (Tugendreich et al . 1993, Proc Natl Acad Sci USA 90, 
10031-10035) . The role of this domain is currently not known, 
but its conservation suggests that it is indispensable for 
CDC27 function. The NH 2 - terminal TPR of CDC27 is not tandemly 
0 repeated and spans the amino acid residues 174 to 2 02 in SEQ 
ID No 5. Proteins, comprising this novel exon sequence 
according to the invention may therefore promote APC-substrate 
action and therewith allowing DNA-replication. On the other 
hand, a peptide comprising the novel exon sequence may be used 
to occupy the binding region of the substrates for the APC 
complex, and therewith inhibiting the complex- substrate 
interactions, resulting in inactivation of APC and to 
polyploiddization/endoreduplication. 

The novel amino acid sequence SEQ ID No 7 
( AYMERL I Li PDEL VTE ENL ) is located just after the last (10th) TPR 
of CDC27 spanning the amino acid residues 670-703 in SEQ ID No 
5. Carboxy- terminal extensions downstream from this 10 th TPR 
and variable in length and sequence are common in all known 
CDC27 proteins. However, the sequence SEQ ID No 7 shows 50 and 
55% homology to the corresponding regions of the CDC2 7 
homologues of Schizosaccaromyces pombe and Aspergillus 
nidulans , respectively. Moreover, and previously not 

recognized, the 25 carboxy- terminal amino acids (ending with 
SEQ ID No 7) immediately downstream of the 10 th TPR compose a 
sequence unit sharing characteristics of a TPR-domain: 1) 
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secondary structure prediction using the Chou-Fasman algorithm 
(Chou and Fasman 1978 , Annu Rev Biochem 47, 251-276) reveals 
the possibility of the 25 amino acid stretch to form an 
a-helix, 2) applying the Eisenberg algorithm (Eisenberg 1984, 
Annu Rev Biochem 53, 595-623) furthermore predicts the 
existence of two amphipathic domains within the a-helix formed 
by the same 2 5 amino acid sequence, 3) a truncated TPR of 27 
amino acids exists in the SKI3 antiviral protein of 
Saccharomyces cerevisiae (Rhee et al. 1989, Yeast 5, 149-158) . 
Remarkably, three consecutive core amino acids of this TPR, 
RLI, are also present in SEQ ID No 7 and, although very 
limited, some further homology can be discovered. Thus, 
although circumstancial , these data may suggest that SEQ ID No 
7 is part of a truncated TPR. If so, the block of tandemly 
repeated TPRs in CDC27 should be extended from 9 (spanning 
amino acids 406 to 703 in SEQ ID No 5) to 10 (amino acids 704 
to 728 in SEQ ID No 5) . Interestingly, it has been suggested 
that a dimer of the basic 34 amino acid TPR repeat is the more 
common evolutionary unit (Sikorski et al. 1990, Cell 60, 307- 
317) . 

The effect of mutations in one out of the tandem series of TPRs 
can be very specific. For instance, a point mutation in the 
most highly conserved 7 th TPR domain of yeast CDC 2 7 results in 
a greatly reduced affinity for interaction with yeast CDC23, 
but not for interaction with yeast CDC16 or wild-type CDC27 . 
A single amino acid insertion in the same domain destroys the 
Of-helix and abolishes interaction with wild-type CDC27 as well 
as CDC16 (Lamb et al . 1994, EMBO J 13, 4321-4328) . Moreover, 
detailed experiments with the human TPR-containing CDC16 and 
CDC27 homologues and another TPR-containing protein regulating 
the APC-activity, PP5, revealed that TPR proteins display 
discriminate binding to other TPR proteins. More specifically 
for CDC27, deletion of the first TPR domain in this protein 
abolishes CDC16 binding, but not PP5 binding (Ollendorf and 
Donoghue 1997, J Biol Chem 272, 32011-32018) . Mutagenesis 
studies with the yeast CDC 2 3 showed that only a few residues 
in or near the most canonical 6 th TPR unit result in 
temperature -sensitive defects (Sikorski et al. 1993, Mol Cell 
Biol 13, 1212-1221). Separate TPR domains thus seem to be 
involved in specific interactions with other proteins and only 
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very limited alterations in these domains seem to be tolerated. 
Any erroneous modulation of APC activity, e.g. by mutations in 
SEQ ID No 6 as part of a conserved sequence in CDC27 proteins 
and/or SEQ ID No 7 being a putative novel truncated TPR motif 
in CDC27, will likely result in loss of control over normal DNA 
replication cycles via the mechanisms described above. 
Mutations in CDC27 can indeed trigger DNA overreplication and 
thus the generation of polyploid cells (Heichmann and Roberts 
1996, Cell 85, 39-48) . Such endoreduplication might be related 
to cell expansion (Traas et al . 1998, Curr Opin Plant Biol l f 
4 98-503) and, thus, a higher storage capacity in such polyploid 
cells. This advantageous property is highly desired in crop 
plants or parts of plants such as seeds, roots, tubers and 
fruits . 

Modulating the said amino acid sequence would impair the 
formation of functional APC, whereas cdc27 comprising such a 
mutation would still be able to interact with the substrate and 
therewith titrating the substrate out, leading to the 
abolishment of APC-function in the plant cell, resulting in 
polyploid cells. 

It is to be understood, that DNA replication modulating 
proteins according to the present invention, comprising one or 
more of the above mentioned amino acid sequences, or having 80% 
amino acid identity therewith, may originate from plant species 
as well as from other species as long as the said proteins are 
capable of modulating DNA replication in one or more plant 
species . 

The term "protein" is to be understood as any amino acid 
sequence having a biological function, optionally modified by 
e.g. glycosylation. The protein according to the present 
invention preferably comprises one or more of the amino acid 
sequences according to c) or d) , the respective amino acid 
identity preferably being at least 50% . 

The term "protein" includes single-chain polypeptide 
molecules as well as multiple-polypeptide complexes where 
individual constituent polypeptides are linked by covalent or 
non-covalent means. The term "polypeptide" includes peptides 
of two or more amino acids in length, typically having more 
than 5, 10 or 20 amino acids. 
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It will be understood that amino acid sequences of the 
invention are not limited to the sequences obtained from the 
particular protein but also include homologous sequences 
obtained from any source, for example related plant proteins, 
5 cellular homologues and synthetic peptides, as well as variants 
or derivatives thereof. 

Thus, the present invention covers variants, homologues 
or derivatives of the amino acid sequences of the present 
invention, as well as variants, homologues or derivatives of 
10 the nucleotide sequence coding for the amino acid sequences of 
the present invention. 

In the context of the present invention, a homologous 
sequence is taken to include an amino acid sequence which is 
at least 50, 60, 70, 80 or 90% identical, preferably at least 
15 95 or 98% identical at the amino acid level over at least 18, 
preferably all amino acids within the sequences as shown in SEQ 
ID Nos 2, 3, 4, 6 and 7 in the sequence listing herein. In 
particular, homology should typically be considered with 
respect to those regions of the sequence known to be essential 

2 0 for the above discussed functions of the novel amino acid 

sequences rather than non-essential neighbouring sequences. 
Although homology can also be considered in terms of similarity 
(i.e. amino acid residues having similar chemical properties/ 
functions) , in the context of the present invention it is 
25 preferred to express homology in terms of sequence identity. 

Homology comparisons can be conducted by eye, or more 
usually, with the aid of readily available sequence comparison 
programs. These commercially available computer programs can 
calculate % homology between two or more sequences. 

3 0 % Homology may be calculated over contiguous sequences, 

i.e. one sequence is aligned with the other sequence and each 
amino acid in one sequence directly compared with the 
corresponding amino acid in the other sequence, one residue at 
a time. This is called an "ungapped" alignment. Typically, such 
3 5 ungapped alignments are performed only over a relatively short 
number of residues (for example less than 50 contiguous amino 
acids) . 

Although this is a very simple and consistent method, it 
fails to take into consideration that, for example, in an 
40 otherwise identical pair of sequences, one insertion or 
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deletion will cause the following amino acid residues to be put 
out of alignment -, thus potentially resulting in a large 
reduction in % homology when a global alignment is performed. 
Consequently, most sequence comparison methods are designed to 
5 produce optimal alignments that take into consideration 
possible insertions and deletions without penalising unduly the 
overall homology score* This is achieved by inserting "gaps" 
in the sequence alignment to try to maximise local homology. 

However, these more complex methods assign "gap 

10 penalties" to each gap that occurs in the alignment so that, 
for the same number of identical amino acids, a sequence 
alignment with as few gaps as possible - reflecting higher 
relatedness between the two compared sequences - will achieve 
a higher score than one with many gaps. "Affine gap costs" are 

15 typically used that charge a relatively high cost for the 
existence of a gap and a smaller penalty for each subsequent 
residue in the gap. This is the most commonly used gap scoring 
system. High gap penalties will of course produce optimised 
alignments with fewer gaps. Most alignment programs allow the 

20 gap penalties to be modified. However, it is preferred to use 
the default values when using such software for sequence 
comparisons. For example when using the GCG Wisconsin Bestfit 
package (see below) the default gap penalty for amino acid 
sequences is -12 for a gap and -4 for each extension. 

25 Calculation of maximum % homology therefore firstly 

requires the production of an optimal alignment, taking into 
consideration gap penalties. A suitable computer program for 
carrying out such an alignment is the GCG Wisconsin Bestfit 
package (University of Wisconsin, U.S.A.; Devereux et al . , 

30 1984, Nucleic Acids Research 12:387). Examples of other 
software than can perform sequence comparisons include, but are 
not limited to, the BLAST package (see 
http://www.ncbi.nih.gov/BLAST/), FASTA (Atschul et al . , 1990, 
J. Mol . Biol., 403-410; FASTA is available for online searching 

35 at, for example, http : //www. 2 . ebi . ac . uk . f asta3 ) and the 
GENEWORKS suite of comparison tools. However it is preferred 
to use the GCG Bestfit program. 

Although the final % homology can be measured in terms of 
identity, the alignment process itself is typically not based 
40 on an all-or-nothing pair comparison. Instead, a scaled 
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similarity score matrix is generally used that assigns scores 
to each pairwise comparison based on chemical similarity or 
evolutionary distance. An example of such a matrix commonly 
used is the BLOSUM62 matrix - the default matrix for the BLAST 
5 suite of programs. GCG Wisconsin programs generally use either 
the public default values or a custom symbol comparison table 
if supplied (see user manual for further details) . It is 
preferred to use the public default values for the GCG package, 
or in the case of other software, the default matrix, such as 
10 BLOSUM62. 

Once the software has produced an optimal alignment, it 
is possible to calculate % homology, preferably % sequence 
identity. The software typically does this as part of the 
sequence comparison and generates a numerical result. 

15 

Polypeptide Variants and Derivatives 

The terms "variant" or "derivative" in relation to the 
amino acid sequences of the present invention includes any 
substitution of, variation of, modification of, replacement of, 
20 deletion of or addition of one (or more) amino acids from or 
to the sequence providing the resultant amino acid sequence has 
similar activity as the polypeptides presented in the sequence 
listings . 

The sequences of the invention may be modified for use in 
25 the present invention. Typically, modifications are made that 
maintain the activity of the sequence. Amino acid substitutions 
may be made, for example from 1, 2 or 3 to 10, 2 0 or 30 
substitutions provided that the modified sequence retains the 
relevant activity. E.g. the kinase activity should be 
30 maintained in such a variant of a peptide according to the 
invention comprising SEQ ID NO 2. Amino acid substitutions may 
include the use of non-naturally occurring analogues, for 
example to increase blood plasma half-life of a therapeutically 
administered polypeptide. 
35 Conservative substitutions may be made, for example 

according to the Table below. Amino acids in the same block in 
the second column and preferably in the same line in the third 
column may be substituted for each other: 
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ALIPHATIC 


Non-polar 


GAP 


I L V 


Polar - uncharged 


C S T M 


N Q 


Polar - charged 


D E 


K R 


AROMATIC 




H F W Y ] 



10 Proteins of the invention are typically made by 

recombinant means. However they may also be made by synthetic 
means using techniques well known to skilled persons such as 
solid phase synthesis. Proteins of the invention may also be 
produced as fusion proteins, for example to aid in extraction 

15 and purification. Examples of fusion protein partners include 
glutathione-S- transferase (GST) , 6xHis, GAL4 (DNA binding 
and/or transcriptional activation domains) and j3-galactosidase . 
It may also be convenient to include a proteolytic cleavage 
site between the fusion protein partner and the protein 

20 sequence of interest to allow removal of fusion protein 
sequences. Preferably the fusion protein will not hinder the 
function of the protein of interest sequence. 

Proteins of the invention may be in - a substantially 
isolated form. It will be understood that the protein may be 

25 mixed with carriers or diluents which will not interfere witti 
the intended purpose of the protein and still be regarded as 
substantially isolated. A protein of the invention may also be 
in a substantially purified form, in which case it will 
generally comprise the protein in a preparation in which more 

30 than 90%, e.g. 95%, 98% or 99% of the protein in the 
preparation is a protein of the invention. 

In a special embodiment, the protein according to the 
present invention comprises the amino acid sequence as given 
in SEQ ID NO 1 or NO 5, or has at least 80% preferably at least 

3 5 90% amino acid identity with one of the said sequences. SEQ ID 
NO 1 relates to the complete amino acid sequence (889 AA) of 
the novel CDC7 protein according to the present invention 
comprising SEQ ID NOS 2, 3 and 4 (AA 411-430, 710-729, 767- 
795) . SEQ ID NO 5 is the complete amino acid sequence (728 AA) 
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of the novel plant CDC27 comprising SEQ ID NOS 6 and 7 (AA 37- 
60 and AA 710-727 respectively) . 

Although the proteins according to the present invention 
may be of non-plant origin, as is indicated above, the protein 
according to the present invention is preferably a plant 
protein, more preferably a CDC7 or CDC27 protein, or a 
functional analogue thereof. A functional analogue is to be 
understood as any protein or peptide having similar biological 
effects as a plant CDC7 protein or a CDC27 protein, 
irrespectively of the origin thereof. 

Mutein 

In another embodiment, the present invention relates to 
a mutein of the protein according to the present invention, 
said mutein comprising at least one amino acid substitution, 
deletion or addition, affecting the DNA replicative effect of 
the said protein. 

As is already indicated above, the proteins according to 
the present invention are of high interest for an improvement 
of e.g. agricultural crops or parasite resistance. By 
substituting, deleting or adding amino acids to the protein 
according to the present invention, the modulating effect 
thereof can be affected, which may lead to desirable or 
improved properties of the protein. 

In particular, DNA replication modulating proteins 
according to the invention may be activated or deactivated by 
a phosphorylation-dephosphorylation mechanism, being a known 
regulatory mechanisms for many cell cycle proteins. Therefore, 
in a further embodiment of the present invention, one of the 
phosphorylatable amino acids of the protein according to the 
present invention is deleted or substituted by one or more non- 
phospohorylatable amino acids, which may lead to loss of 
susceptibility of phosphorylation and function. 

In particular, the said substitutions deletions or 
additions may be situated within or flanking the amino acid 
sequence, as given by SEQ ID NOS 2, 3, 4, 6 or 7 (or having at 
least 50% amino acid identity therewith) . 

DNA replicating modulating proteins according to the 
invention may also comprise one or more tetratricopeptide 
repeat (TPR) domains. Such domains have been identified in 
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CDC27 (amino acid regions 174-202, 403-431, 432-465, 466-499, 
500-533, 534-567, 568-601, 602-635, 636-669, 670-703 in SEQ ID 
No 5; delineation of regions based on the yeast CDC27 
homologue; Lamb et al . 1994, EMBO J 13, 4321-4328) as well as 
in CDC16, CDC23 and many other proteins (Goebl and Yanagida 
1991, Trends Biochem Sci 16, 173-177) . The function of these 
TPR domains is to enable the protein to interact with other 
proteins in the anaphase promoting complex (APC) . In the 

CDC27 protein according to the present invention, a novel TPR 
or TPR- like domain has been identified which includes SEQ ID 
No 1. Mutation analysis in TPR domains of yeast CDC27 has 
revealed that intact TPRs are necessary for CDC27 function 
(Lamb et al. 1984, EMBO J 13, 4321-4328) and, thus, also for 
a functional APC. In the absence of CDC 2 7 function, DNA 
synthesis becomes uncoupled from cell cycle progression 
resulting in the establishment of polyploid cells (Heichman and 
Roberts 1996, Cell 85, 39-48) . 

Peptides 

Further, the present invention relates to a peptide, 
comprising 

a) one or more of the amino acid sequences chosen 
from the group consisting of those given by SEQ ID NOS 2, 
3 and 4 / 

b) one or more of the amino acid sequences chosen 
from the group consisting of those, given by SEQ ID NOS 
6 and 7, 

c) one or more amino acid sequences having at 
least 80 % amino acid identity with those of a) , or 

d) one or more amino acid sequences having at 
least 80% amino acid identity with those of b) . 

These peptides, firstly identified by the present 
inventors, are or maybe part of important regulatory sites for 
binding cellular factors or being a substrate for activating/ 
deactivating mechanisms, such as phosphorylation. 

Antibodies 

Furthermore, the present invention relates to antibodies 
specifically recognizing a cell cycle interacting protein 
according to the invention or parts, i.e. specific fragments 
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or epitopes, of such a protein. The antibodies of the invention 
can be used to identify and isolate other cell cycle 
interacting proteins and genes in any organism, preferably 
plants. These antibodies can be monoclonal antibodies, 
5 polyclonal antibodies or synthetic antibodies as well as 
fragments of antibodies, such as Fab, Fv or scFv fragments etc. 
Monoclonal antibodies can be prepared, for example, by the 
techniques as originally described in Kohler and Milstein, 
Nature 256 (1975), 495, and Galfre, J. Meth. Enzymol . 73 

10 (1981), 3, which comprise the fusion of mouse myeloma cells to 
spleen cells derived from immunized mammals. Furthermore, 
antibodies or fragments thereof to the aforementioned peptides 
can be obtained by using methods which are described, e.g., in 
Harlow and Lane "Antibodies, A Laboratory Manual", CSH Press, 

15 Cold Spring Harbor, 1988. These antibodies can be used, for 
example, for the immunoprecipitation and immunolocalization of 
proteins according to the invention as well as for the 
monitoring of the synthesis of such proteins, for example, in 
recombinant organisms, and for the identification of compounds 

20 interacting with the protein according to the invention. For 
example, surface plasmon resonance as employed in the BIAcore 
system can be used to increase the efficiency of phage 
antibodies selections, yielding a high increment of affinity 
from a single library of phage antibodies which bind to an 

25 epitope of the protein of the invention (Schier, Human 
Antibodies Hybridomas 7 (1996), 97-105; Malmborg, J. Immunol. 
Methods 183 (1995) , 7-13) . In many cases, the binding phenomena 
of antibodies to antigens is equivalent to other ligand/anti- 
ligand binding. 

30 

DNA sequences 

Further, the present invention relates to a non-genomic 
DNA sequence, coding for a protein or mutein or peptide 
according to the present invention, or a DNA sequence having 

35 a sequence homology of at least 75% with the said sequence, or 
to the complementary sequence thereof. Also DNA sequences 
having at least 75% homology with the above mentioned DNA 
sequences are encompassed within the invention. These sequences 
are particularly useful in the generation of DNA vectors to 

4 0 multiply the DNA sequence or to introduce the said sequence in 
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a host organism, in order to obtain the encoded protein. 
Further said sequences or parts thereof are advantageously used 
to identify and isolate homologous sequences from other 
biological species . 

The DNA sequence is preferably substantially free of 
sequences intervening the coding sequence, and is preferably 
cDNA. 

DNA- sequences of the invention comprise nucleic acid 
sequences encoding the amino acid sequences of the invention. 
It will be understood by a skilled person that numerous 
different polynucleotides can encode the same polypeptide as 
a result of the degeneracy of the genetic code. In addition, 
it is to be understood that skilled persons may, using routine 
techniques, make nucleotide substitutions that do not affect 
the polypeptide sequence encoded by the polynucleotides of the 
invention to reflect the codon usage of any particular host 
organism in which the polypeptides of the invention are to be 
expressed. 

Polynucleotides of the invention may comprise DNA or RNA. 
They may be single -stranded or double - stranded . They may also 
be polynucleotides which include within them synthetic or 
modified nucleotides. A number of different types of 
modification to oligonucleotides are known in the art. These 
include methylphosphonate and phosphbrothioate backbones, 
addition of acridine or polylysine chains at the 3' and/or 5' 
ends of the molecule. For the purposes of the present 
invention, it is to be understood that the polynucleotides 
described herein may be modified by any method available in the 
art. Such modifications may be carried out in order to enhance 
the in vivo activity or life span of polynucleotides of the 
invention. 

The terms "variant", "homologue" or "derivative" in 
relation to the nucleotide sequence of the present invention 
include any substitution of, variation of, modification of, 
replacement of, deletion of or addition of one (or more) 
nucleic acid from or to the sequence providing the resultant 
nucleotide sequence codes for a polypeptide, preferably having 
at least the same activity as sequences presented in the 
sequence listings. 
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As indicated above, with respect to sequence homology, 
preferably there is at least 75%, more preferably at least 85%, 
more preferably at least 90% homology to the sequences shown 
in the sequence listing herein. More preferably there is at 
least 95%, more preferably at least 98%, homology. Nucleotide 
homology comparisons may be conducted as described above. A 
preferred sequence comparison program is the GCG Winsconsin 
Bestfit program described above. The default scoring matrix has 
a match value of 10 for each identical nucleotide and -9 for 
each mismatch. The default gap creation penalty is -50 and the 
default gap extension penalty is -3 for each nucleotide. 

The present invention also encompasses nucleotide 
sequences that are capable of hybridising selectively to the 
sequences presented herein, or any variant, fragment or 
derivative thereof, or to the complement of any of the above. 
Nucleotide sequences are preferably at least 15 nucleotides in 
length, more preferably at least 20, 30, 40 or 50 nucleotides 
in length. 

The term "hybridization" as used herein shall include "the 
process by which a strand of nucleic acid joins with a 
complementary strand through base pairing" as well as the 
process of amplification as carried out in polymerase chain 
reaction technologies. 

Polynucleotides of the invention capable of selectively 
hybridising to the nucleotide sequences presented herein, or 
to their complement, will be generally at least 70%, preferably 
at least 80 or 90% and more preferably at least 95% or 98% 
homologous to the corresponding nucleotide sequences presented 
herein over a region of at least 20, preferably at least 25 or 
30, for instance at least 40, 60 or 100 or more contiguous 
nucleotides. Preferred polynucleotides of the invention will 
comprise regions preferably at least 80 or 90% and more 
preferably at least 95% homologous to nucleotides (1229-1291) , 

(2126-2187) or (2298-2385) of SEQ ID No 8 or (109-181) or 

(2128-2181) of SEQ ID No 9. 

Hybridization conditions are based on the melting 
temperature (Tm) of the nucleic acid binding complex, as taught 
in Berger and Kimmel (1987, Guide to Molecular Cloning 
Techniques, Methods in Enzymology/ Vol 152, Academic Press, San 



Diego CA) , and confer a defined "stringency 11 as explained 
below. 

Maximum stringency typically occurs at about Tm-5°C (5°C 
below the Tm of the probe) ; high stringency at about 5°C to 
5 10°C below Tm; intermediate stringency at about 10°C to 20°C 
below Tm; and low stringency at about 2 0°C to 25 °C below Tm. 
As will be understood by those of skill in the art, a maximum 
stringency hybridization can be used to identify or detect 
identical polynucleotide sequences while an intermediate (or 
10 low) stringency hybridization can be used to identify or detect 
similar or related polynucleotide sequences. 

In a preferred aspect, the present invention covers 
nucleotide sequences that can hybridise to the nucleotide 
sequence of the present invention under stringent conditions 
15 (e.g. 65 °C and 0 . lxSSC {ixSSC = 0.15 M NaCl, 0.015 M Na 3 
Citrate pH 7.0). 

Where the polynucleotide of the invention is double - 
stranded, both strands of the duplex, either individually or 
in combination, are encompassed by the present invention. Where 
20 the polynucleotide is single- stranded, it is to be understood 
that the complementary sequence of that polynucleotide is also 
included within the scope of the present invention. 

Polynucleotides which are not 100% homologous to the 
sequences of the present invention but fall -within the scope 
25 of the invention can be obtained in a number of ways. Other 
variants of the sequences described herein may be obtained for 
example by probing DNA libraries made from a range of 
individuals, for example individuals from different 
populations. In addition, other viral/bacterial, or cellular 
3 0 homologues particularly cellular homologues found in plant 
cells, may be obtained and such homologues and fragments 
thereof in general will be capable of selectively hybridising 
to the sequences shown in the sequence listing herein. Such 
sequences may be obtained by probing cDNA libraries made from 
5 or genomic DNA libraries from other animal species, and probing 
such libraries with probes comprising all or part of SEQ ID 
Nos 8 or 9 under conditions of medium to high stringency. 
Similar considerations apply to obtaining species homologues 
and allelic variants of the polypeptide or nucleotide sequences 
0 of the invention. 
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Variants and strain/species homologues may also be 
obtained using degenerate PCR which will use primers designed 
to target sequences within the variants and homologues encoding 
conserved amino acid sequences within the sequences of the 
5 present invention. Conserved sequences can be predicted, for 
example, by aligning the amino acid sequences from several 
variants/homologues . Sequence alignments can be performed using 
computer software known in the art . For example the GCG 
Wisconsin PileUp program is widely used. 

10 The primers used in degenerate PCR will contain one or 

more degenerate positions and will be used at stringency 
conditions lower than those used for cloning sequences with 
single sequence primers against known sequences . 

Alternatively, such polynucleotides may be obtained by 

15 site directed mutagenesis of characterised sequences, such as 
SEQ ID No 8 or 9. This may be useful where for example silent 
codon changes are required to sequences to optimise codon 
preferences for a particular host cell in which the 
polynucleotide sequences are being expressed. Other sequence 

20 changes may be desired in order to introduce restriction enzyme 
recognition sites, or to alter the property or function of the 
polypeptides encoded by the polynucleotides. 

Polynucleotides of the invention may be used to produce 
a primer, e.g. a PCR primer, a primer for an alternative 

25 amplification reaction, a probe e.g. labelled with a revealing 
label by conventional means using radioactive or non- 
radioactive labels, or the polynucleotides may be cloned into 
vectors. Such primers, probes and other fragments will be at 
least 15, preferably at least 20, for example at least 25, 3 0 

30 or 40 nucleotides in length, and are also encompassed by the 
term polynucleotides of the invention as used herein. 

Polynucleotides such as a DNA polynucleotides and probes 
according to the invention may be produced recombinant ly, 
synthetically, or by any means available to those of skill in 

35 the art. They may also be cloned by standard techniques. 

In general, primers will be produced by synthetic means, 
involving a step wise manufacture of the desired nucleic acid 
sequence one nucleotide at a time. Techniques for accomplishing 
this using automated techniques are readily available in the 

40 art. 
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Longer polynucleotides will generally be produced using 
recombinant means, for example using a PCR (polymerase chain 
reaction) cloning techniques. This will involve making a pair 
of primers (e.g. of about 15 to 30 nucleotides) flanking a 
5 region of the lipid targeting sequence which it is desired to 
clone, bringing the primers into contact with mRNA or cDNA 
obtained from an animal or human cell, performing a polymerase 
chain reaction under conditions which bring about amplification 
of the desired region, isolating the amplified fragment (e.g. 
10 by purifying the reaction mixture on an agarose gel) and 
recovering the amplified DNA. The primers may be designed to 
contain suitable restriction enzyme recognition sites so that 
the amplified DNA can be cloned into a suitable cloning vector. 
For expression of the DNA sequence according to the 
15 invention it may in some instances be advantageous to 
incorporate one or more intervening sequences (introns) in the 
sequence coding for the protein to be expressed, as in some 
expression systems, one or more splicing events must take place 
in order to obtain high expression rates (e.g. for expression 
20 of a barley thionin in transgenic tobacco; Carmona et al . 
1993, Plant J 3, 457-462) . 

However, in most cases, the coding sequence (i.e. the 
cDNA) , accompanied by the proper regulatory elements, such as 
promotor and terminator sequences, are sufficient for proper 
25 expression. 

In a special embodiment (referring to figs 1 and 2) , the 
invention relates to a cDNA sequence, comprising the DNA 
sequence as given by SEQ ID NO 8 or SEQ ID NO 9 , or having a 
sequence homology with SEQ ID NO 8 or SEQ ID NO 9 of at least 
30 75% or is the complementary sequence thereof. SEQ ID NO 8 is 
the cDNA sequence of CDC 7 of Arabidopsis thali an a, comprising 
the coding sequence for the newly identified amino acid 
sequences (SEQ ID NOS 2, 3 and 4) as are discussed above. SEQ 
ID NO 9, is the cDNA sequence of CDC27 of Arabidopsis thaliana, 
3 5 includes the sequences coding for the newly identified amino 
acid sequences (SEQ ID NOS 6 and 7) as discussed above. The 
presence of the amino acid sequences according to the present 
invention in DNA replication modulating proteins, in particular 
in CDC7 and CDC 2 7 respectively, may play an important role in 
40 the biological function of the said proteins. Also, the 
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sequences according to SEQ ID NOS 8 and 9, or parts thereof, 
can advantageously be used to isolate and identify homologous 
sequences of other biological species. 

In particular, the invention relates to a non-genomic DNA 
sequence, coding for a peptide according to the invention, 
corresponding to nucleotides 1229-1291, 2126-2187 or 2298-2385 
of SEQ ID NO 8, or to nucleotides 109-181 or 2128-2181 of SEQ 
ID NO 9 , or a DNA sequence, having a sequence homology of at 
least 75% to the said sequence or the complementary sequence 
thereof. Such a DNA sequence codes for an amino acid sequence 
that till now was not known to be part of DNA replication 
modulating proteins, in particular of CDC 7 and CDC27. It was 
now found, that DNA sequences, corresponding to the nucleotides 
1229-1291, 2126-2187 and 2298-2385 of SEQ ID NO 8 code for new 
amino acid sequences of plant CDC7 . The DNA sequence, 
corresponding to nucleotides 109-181 and 2128-2181 of SEQ ID 
NO 9 code for novel amino acid sequences of plant CDC27, of 
Arabidopsis thaliana. Said DNA sequences may therefore in 
particular be used to identify and isolate genes or gene 
fragments from other plants or organisms that are homologous 
to the CDC7 or CDC27 sequence discussed above. 

Probes and primers 

- In a further embodiment, the DNA sequences according to 
the invention may be used as primers for use in a nucleic acid 
amplification technique. Said primers can be used in a 
particular amplification technique to identify and isolate 
substantially homologous nucleic acid molecules from other 
plant species . The design and use of said primers is known by 
the person skilled in the art. Preferably such amplification 
primers comprise a contiguous sequence of at least 6 
nucleotides, in particular 13 nucleotides, preferably 15 to 25 
nucleotides or more, identical or complementary to the 
nucleotide sequence encoding the amino acid sequence of SEQ ID 
Nos 1-7. Another application is the use as a hybridization 
probe to identify nucleic acid molecules hybridizing with a 
nucleic acid molecule of the invention by homology screening 
of genomic DNA or cDNA libraries. Furthermore, the person 
skilled in the art is well aware that it is also possible to 
label such a nucleic acid probe with an appropriate marker for 



specific applications, such as for the detection of the 
presence of a nucleic acid molecule of the invention in a 

sample derived from an organism, in particular plants. A number 
of companies such as Pharmacia Biotech (Piscataway NJ) , Promega 

(Madison WI) , and US Biochemical Corp (Cleveland OH) supply 
commercial kits and protocols for these procedures. Suitable 
reporter molecules or labels include those radionuclides, 
enzymes, fluorescent, chemiluminescent , or chromogenic agents 
as well as substrates, cof actors, inhibitors, magnetic 
particles and the like. 

The nucleic acid sequence for a protein of the invention 
can also be used to generate hybridization probes for mapping 
the naturally occurring genomic sequence. The sequence may be 
mapped to a particular chromosome or to a specific region of 
the chromosome using well known techniques . These include in 
situ hybridization to chromosomal spreads, flow- sorted 
chromosomal preparations, or artificial chromosome 
constructions such as yeast artificial chromosomes, bacterial 
artificial chromosomes, bacterial PI constructions or single 
chromosome cDNA libraries as reviewed in Price (Blood Rev. 7 
(1993), 127-134) and Trask (Trends Genet . 7 (1991), 149-154). 

Vectors 

Polynucleotides of the invention can be incorporated into 
a recombinant replicable vector. The vector may be used to 
replicate the nucleic acid in a compatible host cell. Thus in 
a further embodiment, the invention provides a method of making 
polynucleotides of the invention by introducing a 
polynucleotide of the invention into a replicable vector, 
introducing the vector into a compatible host cell, and growing 
the host cell under conditions which bring about replication 
of the vector. The vector may be recovered from the host cell. 
Suitable host cells include bacteria such as E. coli, yeast, 
mammalian cell lines and other eukaryotic cell lines, for 
example insect Sf9 cells. 

Preferably, a polynucleotide of the invention in a vector 
is operably linked to a control sequence that is capable of 
providing for the expression of the coding sequence by the host 
cell, i.e. the vector is an expression vector. The term 
"operably linked" means that the components described are in 
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a relationship permitting them to function in their intended 
manner. A regulatory sequence "operably linked" to a coding 
sequence is ligated in such a way that expression of the coding 
sequence is achieved under condition compatible with the 
5 control sequences. 

The control sequences may be modified, for example by the 
addition of further transcriptional regulatory elements to make 
the level of transcription directed by the control sequences 
more responsive to transcriptional modulators. 

10 Vectors of the invention may be transformed or transf ected 

into a suitable host cell as described below to provide for 
expression of a protein of the invention. This process may 
comprise culturing a host cell transformed with an expression 
vector as described above under conditions to provide for 

15 expression by the vector of a coding sequence encoding the 
protein, and optionally recovering the expressed protein. 

The vectors may be for example/ plasmid or virus vectors 
provided with an origin of replication, optionally a promoter 
for the expression of the said polynucleotide and optionally 

2 0 a regulator of the promoter. The vectors may contain one or 

more selectable marker genes, for example an ampicillin 
resistance gene in the case of a bacterial plasmid or a 
neomycin resistance gene for a mammalian vector. Vectors may 
be used, for example, to transf eet or transform a host cell. 
25 Control sequences operably linked to sequences encoding 

the protein of the invention include promoters/enhancers and 
other expression regulation signals. These control sequences 
may be selected to be compatible with the host cell for which 
the expression vector is designed to be used in. The term 

3 0 promoter is well-known in the art and encompasses nucleic acid 

regions ranging in size and complexity from minimal promoters 
to promoters including upstream elements and enhancers. 

The promoter is typically selected from promoters which 
are functional in mammalian, cells, although prokaryotic 

3 5 promoters and promoters functional in other eukaryotic cells 

may be used. The promoter is typically derived from promoter 
sequences of viral or eukaryotic genes. For example, it may be 
a promoter derived from the genome of a cell in which 
expression is to occur. With respect to eukaryotic promoters, 

4 0 they may be promoters that function in a ubiquitous manner 
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(such as promoters of a-actin, b-actin, tubulin) or, 
alternatively, a tissue-specific manner (such as promoters of 
the genes for pyruvate kinase) . Tissue-specific promoters 
specific for selected plant tissue cells are particularly 
5 preferred, see below in section "transgenic plants". 

It may also be advantageous for the promoters to be 
inducible so that the levels of expression of the heterologous 
gene can be regulated during the life-time of the cell. 
Inducible means that the levels of expression obtained using 
10 the promoter can be regulated. 

In addition, any of these promoters may be modified by the 
addition of further regulatory sequences, for example enhancer 
sequences. Chimeric promoters may also be used comprising 
sequence elements from two or more different promoters 
15 described above. 

Therefore, the invention relates to DNA vectors, 
particularly plasmids, cosmids, viruses, bacteriophages and 
other vectors used conventionally in genetic engineering that 
comprise a DNA sequence according to the invention. Methods 

2 0 which are well known to those skilled in the art can be used 

to construct various plasmids and vectors: see for example, the 
techniques described in Sambrook, Molecular Cloning A 
Laboratory Manual, Cold Spring Habor Laboratory (1989) N.Y. and 
Ausubel, Current Protocols in Molecular Biology, Green 

25 Publishing Associates and Wiley Interscience, N.Y. (1989) , 
(19 94) . Said vector further preferably comprises a promoter, 
functional in plant cells, operably linked to the DNA sequence, 
according to the invention. With such a vector, the DNA 
sequence according to the invention can be expressed in plant 

30 cells and may modulate the DNA replication in the said cells. 

Identifying derivatives, variants and homologs of the c 11 
cycle interacting proteins of the invention 

In another embodiment, the present invention relates to 

3 5 a method for identifying and/or obtaining proteins capable of 

modulating the DNA repliction in plants, comprising a two- 
hybrid screening assay, using CDC27 or CDC 7 polynucleotide 
sequences as a bait and a cDNA library of a cell suspension 
culture as prey. 

40 
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The yeast two-hybrid assay is a genetic strategy developed 
to identify proteins (encoded by the cDNAs , the * preys ' ) able 
to interact in vivo with a known protein (the 'bait'). 
Interactions between proteins are detected through the 
5 reconstitution of the activity of a transcription activator and 
the subsequent expression of a reporter gene. The cell culture 
may be from any organism possessing cell cycle interacting 
proteins such as animals, preferably mammals. Particularly 
preferred are plant cell suspension cultures such as from 

10 Aretbidopsis . The nucleic acid molecules encoding proteins or 
peptides identified to interact with CDC7 or CDC2 7 in the above 
mentioned assay can be easily obtained and sequenced by methods 
known in the art. Therefore, the present invention also relates 
to a DNA sequence encoding a cell cycle interacting protein 

15 obtainable by the method of the invention. 

Transgenic plants 

To analyse the industrial applicabilities of the 
invention, transformed plants can be made using the nucleotide 

2 0 sequences according to the invention. Such a transformation of 

the new gene(s), proteins or inactivated variant s/muteins 
thereof will either positively or negatively have an effect on 
cell division. Methods to modify the expression levels and/or 
the activity are known to persons skilled in the art and 

25 include for instance overexpression, co- suppression, the use 
of ribozymes, sense and anti-sense strategies, gene silencing 
approaches. "Sense strand" refers to the strand of a double- 
stranded DNA molecule that is homologous to a mRNA transcript 
thereof. The "anti-sense strand" contains an inverted sequence 

30 which is complementary to that of the "sense strand". 

Hence, the nucleic acid molecules according to the 
invention are in particular useful for the genetic manipulation 
of plant cells in order to modify the characteristics of plants 
and to obtain plants with modified, preferably with improved 

3 5 or useful phenotypes. Similarly, the invention can also be used 

to modulate the cell division and the growth of cells, 
preferentially plant cells, in in vitro cultures. A transformed 
plant can thus be obtained by transforming a plant cell with 
a gene encoding a polypeptide concerned or fragment thereof 

4 0 alone or in combination. For this purpose tissue specific 




promoters, in one construct or being present as a separate 
construct in addition to the sequence concerned, can be used. 

Thus, the present invention relates to a method for the 
production of transgenic plants, plant cells or plant tissue 
comprising the introduction of a nucleic acid molecule or 
vector of the invention into the genome of said plant, plant 
cell or plant tissue. 

The invention further relates to a method for modulating 
DNA replication in plant cells, plant parts or plants by 
conferring to one or more plant cells the capacity to provide 
a protein, or a mutein thereof according to the invention, in 
an amount sufficient to modulate DNA replication and/or to 
block mitosis of the said cells. 

In particular, the said capacity is conferred to one or 
more plant cells, by 

a) transforming one or more plant cells with DNA 
according to the invention or with a vector according to 
the invention, 

b) maintain or culture the plant cells in order to 
regenerate plant parts or plants from the transformed 
cells 

c) incubating the cells, plant parts or plants at 
conditions, allowing expression of the DNA according to 
claim 9 or 10, to produce a protein according to the 
invention or a mutein thereof according to the invention. 

For the expression of the nucleic acid molecules 
according to the invention in sense or antisense 
orientation in plant cells, the molecules are placed 
under the control of regulatory elements which ensure the 
expression in plant cells. These regulatory elements may 
be heterologous or homologous with respect to the nucleic 
acid molecule to be expressed as well with respect to the 
plant species to be transformed. In general, such 
regulatory elements comprise a promoter active in plant 
cells. To obtain expression in all tissues of a 
transgenic plant, preferably constitutive promoters are 
used, such as the 3 5 S promoter of CaMV (Odell, Nature 
313 (1985) , 810-812) or promoters of the polyubiquitin 
genes of maize (Christensen, Plant Mol. Biol. 18 (1982), 
675-689) . In order to achieve expression in specific 
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tissues of a transgenic plant it is possible to use 
tissue specific promoters (see, e.g., Stockhaus, EMBO J. 
8 (1989) , 2245-2251) . Known are also promoters which are 
specifically active in tubers of potatoes or in seeds of 
different plants species, such as maize, Vicia, wheat, 
barley etc . Inducible promoters may be used in order to 
be able to exactly control expression. An example for 
inducible promoters are the promoters of genes encoding 
heat shock proteins. Also microspore- specif ic regulatory 
elements and their uses have been described (WQ96/16182) . 
Furthermore, the chemically inducible Tet- system may be 
employed (Gatz, Mol. Gen. Genet. 227 (1991); 229-237). 
Further suitable promoters are known to the person 
skilled in the art and are described, e.g., in Ward 
(Plant Mol. Biol. 22 (1993), 361-366). The regulatory 
elements may further comprise transcriptional and/or 
translational enhancers functional in plants cells. 
Furthermore, the regulatory elements may include 
transcription termination signals, such as a poly-A 
signal, which lead to the addition of a poly A tail to 
the transcript which may improve its stability. 
Methods for the introduction of foreign DNA into plants 
are also well known in the art. These include, for example, the 
transformation of plant cells or tissues with T-DNA using 
Agrobacterium tumefaciens or Agrobacterium rhizogenes , the 
fusion of protoplasts, direct gene transfer (see, e.g., EP-A 
164 575) , injection, electroporation, biolistic methods like 
particle bombardment , pollen-mediated transformation, plant RNA 
virus - mediated trans format ion , liposome - mediated 
transformation, transformation using wounded or enzyme -degraded 
immature embryos, or wounded or enzyme -degraded embryogenic 
callus and other methods known in the art. 

In general, the plants which can be modified according to 
the invention and which either show overexpression of a protein 
according to the invention or a reduction of the synthesis of 
such a protein can be derived from any desired plant species. 
They can be monocotyledonous plants or dicotyledonous plants, 
preferably they belong to plant species of interest in 
agriculture, wood culture or horticulture interest, such as 
crop plants (e.g. maize, rice, barley, wheat, rye, oats etc.), 
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potatoes, oil producing plants (e.g. oilseed rape, sunflower, 
pea nut, soy bean, etc.)/ cotton, sugar beet, sugar cane, 
leguminous plants (e.g. beans, peas etc.), wood producing 
plants, preferably trees, etc. The invention further relates 
to progeny of such plants and to plant material such as roots, 
flowers, fruit, leaves, pollen, seeds, seedlings or tubers, 
obtainable from the plant according to the invention. 

The invention further relates to a plant cell, transformed 
with a vector according to the present invention, or comprising 
DNA according to the present invention. The invention also 
relates to plants, obtainable by the method according to the 
present invention and to progeny of such a plant and to plant 
material, such as roots, flowers, fruit, leaves, pollen, seeds, 
seedlings or tubers, obtainable from the plant according to the 
invention. 

Mutants 

In further embodiments of the invention, expression of 
dominant negative mutants of CDC7 or CDC 2 7 are used to modulate 
DNA replication in plant cells, plant tissues, plant organs 
and/or whole plants. These embodiments involve the 
overexpression of a mutein or mutant gene according to the 
present invention which will inhibit the function of a wild- 
type allele when expressed in the same cell, thereby the 
phenotype of a transgenic plant, plant organ or plant cell 
expressing the mutant will be that of a blocked cell cycle 
progress ion . 

Herskowitz, Nature 329: 219-222 (1987), reviews the 
inactivation of genes by interference at the protein level, 
which is achieved through the expression of specific genetic 
elements encoding a polypeptide comprising both intact, 
functional domains of the wild type protein as well as 
nonfunctional domains of the same wild type protein. Such 
peptides are known as dominant negative mutant proteins. 
Examples of dominant negative mutants are given below. 

CDC 7 dominant negative mutant - Nematod resistance 

In a special embodiment of the present invention, a DNA 
vector comprises DNA, coding for a mutein according to the 
present invention, that is operably linked to a nematode - 
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induced promoter, said promoter functional in plant cells. 
Nematode infection of plants may cause severe problems to plant 
growth and crop generation. After penetrating the roots of 
their hosts, nematodes induce, at the infection sites, the 
5 development of feeding cells, specialised in the uptake of 
solutes from the vascular system of the plant. These infection 
sites are of crucial importance for the development for the 
parasite. In this way, root-knot nematodes induce 
multinucleated giant cells in the infected plant with highly 

10 elevated DNA contents. By specifically blocking the DNA 
synthesis in the feeding cells, the formation of the said 
multinucleated giant cells may be blocked, so that the 
nematodes may not further develop. One can contemplate that a 
CDC 7 mutein, which is not further capable to induce the onset 

15 of the DNA synthesis, e.g. by loss of one or more 
phosphorylation sites or loss of binding function to a plant 
homolog of yeast DBF4 (Jackson et al 1993 Mol Cell Biol 13, 
2899-2908) could, when present in sufficient amounts, block the 
onset of the DNA synthesis. When DNA, coding for such a mutein, 

20 and under the control of a promoter, functional in plant cells 
and inducible by the presence of nematodes in or in the 
vicinity of the plant cells, is comprised in the plant cells, 
the mutein can be expressed in the presence or vicinity of 
nematodes. This may lead to a DNA synthesis block, therewith 

25 avoiding further nematode development . The advantage of such 
a system is the fact that the plant is not producing any 
heterologous nematocide, that may be harmful for the plant 
itself. Such a system is not restricted to CDC7. The person, 
skilled in the art, aware of this application, will be well 

30 aware of the possibilities to take other DNA replication 
modulating proteins, such as CDC27 for developing an analogous 
anti -nematode system . 

CDC27 mutant - Endoreduplication 

35 A further embodiment of the invention involves the down 

regulation of CDC27. A further embodiment of the invention 
involves the downregulation of CDC27 resulting in suppression 
of the APC complex, modulation of DNA replication and/or 
blocking mitosis. This can be achieved by expression of CDC27 

40 point mutants. An alternative strategy can be envisaged 




involving a CDC27 mutein consisting of a block of TPR tandem 
repeats. Such a mutein is still likely to interact with other 
TPR- containing proteins from the APC such as CDC 16 and CDC23 
or APC regulator proteins such as PP5 . As such, APC component 
5 proteins or APC regulator proteins would probably be titrated 
out and normal APC function be prevented. Based on results 
already obtained from experiments designed to delineate TPR 
domains involved in the interaction between two TPR proteins 
(Lamb et al . 1984, EMBO J 13 , 4321-4328; Ollendorf and Donoghue 
10 1997, J Biol Chem 272, 32011-32018) , this strategy might indeed 
would prove valuable. Overexpression of CDC27 muteins, via the 
effect on the APC, can be used to enhance endoreduplication in 
plant cells, plant tissues, plant organs, or whole plants. 
For example, as is described above, a CDC27 mutein 
15 wherein the SEQ ID No 7 has been mutated, leading to the 
incapability of this mutein to bind with other factors of 
the APC can be mentioned. The mutated protein would be still 
able to interact with the substrate, therewith titrating out 
the APC, abolishing or at least seriously reducing the APC- 
20 function, leading to the formation of polyploid cells. Also, 
mutations in SEQ ID No 6 could render the mutein incapable 
of interacting with the substrate but still capable of 
binding with the other factors of the APC-complex. The 
result- is the generation of a dominant negative, as the 
!5 complex will -not be able to drive the destruction of key 
components of the cell cycle machinery, responsible to 
control the number of DNA- replication cycles. 

By manipulating the level of endoreduplication one can 
increase the storage capacity of, for example, endosperm 
0 cells. Thus, another aspect of the current invention is that 
one or more DNA sequences, vectors or proteins, regulatory 
sequences or recombinant DNA molecules of the invention can 
be used to modulate, for instance, endoreduplication in 
storage cells, storage tissues and/or storage organs of 
5 plants or parts thereof. 

Preferred target storage organs and parts thereof for 
the modulation of endoreduplication are, for instance, seeds 
(such as from cereals, als, oilseed crops), roots (such as in 
sugar beet) , tubers (such as in potato) and fruits (such as 
0 in vegetables and fruit species) . Furthermore it is expected 
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that increased endoreduplication in storage organs and parts 
thereof correlates with enhanced storage capacity and as 
such with improved yield. In yet another embodiment of the 
invention, a plant with modulated endoreduplication in the 
5 whole plant or parts thereof can be obtained from a single 
plant cell by transforming the cell, in a manner known to 
the skilled person, with the above -described means. 

CDC27 and CDC7 mutants - Sterile plants 

10 Another embodiment of the invention relates to a method 

for modulating DNA replication and the resultant generation 
of male or female sterile plants. This would be achieved by 
the expression of dominant negative mutants of either cdc7 
or cdc27 under the control of very specific promoters - 

15 either from male or female gametophytes - to block cell 

division and disrupt meiosis. The resulting plants would be 
naturally sterile. 

Overexpression of CDC 7 and DBF4 activate DNA synthesis 

20 Another embodiment of the invention relates to a method 

for the generation of plant cells, plant tissues, plant 
organs, or whole plants with the capacity for the 
overexpression of CDC7 in combination with a plant homolog 
of Dbf4 thereby modulating DNA replication. Results in yeast 

25 indicate that the association of Dbf4 with CDC 7 is essential 
for the Gl to S transition, namely DNA synthesis (Ohtoshi A, 
Miyake T, Arai K, Masai H; Mol Gen Genet 254(5): 562-70 1997 
May 20) . Therefore in the present invention, by 
overexpressing both CDC 7 and Dbf4 proteins, one can 

30 activate, stimulate or initiate DNA synthesis in cells where 
DNA synthesis does not normally take place, such as cells 
that have already gone through the cell cycle. As a 
consequence the amount of DNA is increased in the cell 
therewith manipulating the level of endoreduplication as is 

35 outlined above. 

Polyploid plants 

Another embodiment of the invention relates to the 
generation of polyploid plant cells, plant parts or plants. 

40 




If for example, plant cells are transformed with a 
vector, comprising the coding sequence of plant CDC27, 
according to the present invention, under the control of a 
suitable promotor and optionally other expression 
5 controlling elements, these plant cells may produce CDC27 . 
When the said plant cells produce CDC27 protein in a 
sufficient amount, extra rounds of DNA replication may take 
place before mitosis, leading to polyploid cells. 

0 Characterisation of CDC7 and CD27 genes 

The architecture of the CDC7 and CDC27 genes are 
illustrated in figures 1 and 2. Figure 1 illustrates the 
genomic architecture of the Arabidopsis CDC 7 gene, wherein 
the exons are boxed. The numbers above the box indicate the 

5 length of the exon, the number below and between two boxes 
indicates the length of the intron. 

The total length of the coding sequence is 2667 
nucleotides, coding for 889 amino acids. The fifth, eleventh 
and thirteenth exons comprise novel coding sequence; in 

0 figure 1, the corresponding boxes are black. It is to be 
understood, and obvious to a skilled person, that the first 
and the last triplet of the coding sequence of an exon, may 
partially be encoded by the last two or one nucleotide (s) 
- from the adjacent downstream exon, and, accordingly, by the" 
first two or one nucleotide (s) of the adjacent upstream 
exon. In figure 2, the genomic architecture of the CDC27 
gene of Arabidopsis thaliana is depicted as explained for 
figure 1. The second and the sixteenth (last) exon (black in 
figure 2) comprise novel coding sequences and were not 
identified in the known genomic CDC2 7 sequence of 
Arabidopsis thaliana (see text) . The entire sequence 
comprises 2187 nucleotides, corresponding to 728 amino 
acids. 

In figures 3 and 4, the complete cDNA sequence of CDC7 
and CDC27, respectively, according to the present invention 
are depicted, with the respective encoded amino acid 
sequence therebelow. Vertical lines in the nucleotide 
sequence indicate the exon boundaries, i.e. 2 | 3 is the 
boundary between exons 2 and 3 . The exon boundaries are 
derived from genomic CDC7 and CDC2 7 sequences (see examples 



05-07- 



33 

1 and 2 respectively) . Such lines are also drawn in the 
amino acid sequence, although, as is indicated above, the 
amino acids, flanking such a vertical line, may be partially 
encoded by the adjacent exon. Exact positioning of the 
5 vertical line is in such a case not possible and is set at 
the left or the right of such an amino acid in an arbitrary 
manner. See examples 1 and 2 for further details. 

The invention will now be further illustrated by the 
following examples, that are not intended to limit the scope 
10 of the invention. 
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EXAMPLES 



10 



Although in general the techniques mentioned herein are 
well known in the art, reference may be made in particular 
to Sambrook et al., Molecular Cloning, A Laboratory Manual 
(1989) and Ausubel et al . , Current Protocols in Molecular 
Biology (1995), John Wiley & Sons, Inc. Further, scientific 
explanations and reasonings in the examples are given for 
illustrative reasons only, without however being bound 
thereto. 



Example 1 . 



ISOLATION OF AN ARABIDOPSIS CDC 7 HOMOLOGUE 

15 

Conserved regions of the Saccharomyces cerevisae and 
Schizosaccharomyces pombe CDC7 homologue genes were used to 
synthesize degenerated oligonucleotides to amplify an 
Arabidopsis CDC7 homologue cDNA fragment. These 
20 oligonucleotides were as follows: 

1 (sense) : 

5'AAA/G ATA/C/T GGA/C/G/T GAA/G GGA/C/G/T ACA/C/G/T TT 

3' 

2 (sense) : - - 

25 5' ATA/C/T ATA/C/T CAC/T AGA/G GAA/G ATA/C/T AA 3' 

3 (antisense) 

5' AG C/TTC A/C/G/TGG A/C/G/TGC C/TCT A/GAA A/C/G/TCC 3' 

4 (antisense) 

5' AC A/C/G/TCC A/C/G/TA/GC A/GCT CCA A/C/G/TAT A/GTC 3' 

30 

First strand cDNA prepared from whole Arabidopsis 
plants using the Superscript Preamplif ication System from 
Life Technologies was used as template in nested PCR 

35 reactions. The first reaction was carried using the pair of 
oligos 1 and 4, and the second reaction used oligos 2 and 3. 
PCR conditions were essentially as described (Ferreira et 
al . 1991). A fragment of approximately 650 bp was eluted 
from an agarose gel, cloned in pGEM-T and sequenced. 

40 Sequencing comparison using the GCG-package version 9.1 
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showed that the deduced amino acid sequence of the PCR 
fragment has approximately 40% homology to the published 
yeast CDC7 sequences. This fragment was then used to screen 
a lambda gtlO cDNA library prepared from total AraJbidopsis 
plants. The largest cDNA isolated, approximately 1,2 kb, was 
completely sequenced by the dideoxy method. This Arabidopsis 
cDNA contains an open reading frame encoded encoding a 
polypeptide of 384 amino acids (amino acid 473 to amino acid 
856 in figure 3) . With the SRS search program the EMBL and 
EMBLnew databanks were screened for gene sequences 
designated or annotated with the term cdc7 . One genomic 
sequence from Arabidopsis tha.lia.na was found (accession 
number Z97342) . This submitted genomic sequence comprised a 
predicted gene, indicated as "having similarity to protein 
kinase HSK of fission yeast", having 11 exons and coding for 
a protein having 829 amino acids. 

With the GCG-package version 9.1, the said genomic 
sequence was compared with the identified partial cDNA 
sequence, using the "best-fit program" . The identified cDNA- 
sequence covered nucleotides 119827 to 121978 of the genomic 
sequence of Z97342 . 

The identified cDNA- sequence did not correspond with 
the complete coding sequence of the predicted gene on the 
Z97342 sequence. Within the present cDNA sequence, two 
additional coding sequences (additional exons) were 
identified, namely nucleotides no 120770-120709 and 120350- 
120263 of Z97342, coding for the amino acid sequences of SEQ 
ID NOS 3 and 4 respectively. 

Upon comparison with the genomic Arabidopsis sequence, 
it however appeared that the present cDNA was not complete. 
To complete our cDNA at the 5' side we used the CAP-finder 
kit (Clontech) , using the primers (CTCTCCCATCTGGTCATGTC, #1; 
GAACATGCAGTAGCCGTACC , #2) specified for the cDNA, in nested 
PCR reactions. For the missing 3' end, two nested sequences 
specific for the cDNA (AAATGGTGCGAACTCAACAC, #2) and 
( TATGGGAAGTAGCCAAGCTG , #1) and an anchored oligo-dT on the 
lower strand were used. PCR conditions were essentially as 
described (Ferreira et al., 1991). The fragments were 
eluted from agarose gel and cloned using standard techniques 
and sequenced. The deduced amino acid sequence encoded by 
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the PCR fragment showed clear homology to the yeast 
published CDC7 sequences and matched with an the above 
mentioned Arabidopsis genomic sequence. The DNA- fragment , 
comprising the missing 5' terminal sequence, comprised an 
5 additional coding sequence of 63nt (nrs 122340 to 122278 in 
Z97342) not identified in 297342, coding for the amino acid 
sequence of SEQ ID NO 2. 

With the obtained sequences, the complete cDNA for the 
CDC7 homologue of Arabidopsis thaliana could be 
10 reconstructed, which is illustrated in figure 3 and in SEQ 
ID NO 8. 

The presently identified CDC7 cDNA comprises 
additional novel coding sequences, corresponding to novel 
exons (nos 5, 11 and 13 in figure 3), that were not 
15 identified in Z97342, and codes for a protein of 890 amino 
acids . 

Example 2. ISOLATION OF AN ARABIDOPSIS CDC27 HOMOLOGUE 

2 0 Conserved regions of the published CDC 2 7 homologue 

genes (Sikorski et al., 1991 Cold Spring Harbor Symposia on 
Quantitative Biology vol LVI, 663-673, 1991) were used to 
synthesize degenerated oligonucleotides to amplify 
Arabidopsis CDC27 cDNA.' The oligonucleotides were as . . " 

25 follows: 



1 (sense) : 

5' TGG GTA/C/G/T TTA/G GCA/C/G/T A/CAA/G GG 3' 

2 (sense) : 

5' ATG GAA/C/G/T G/ATT/C/A TA/TC/T AGA/C/G/T AC 3' 
30 3 (antisense) 

5' AGA/G CAT/C TAT/C AAT/C GCA/C/G/T TGG 3' 
4 (antisense) 

5' TA T/A/G AC/T CAT A/C/G/TCC C/TAA A/C/G/CC A/GAA 3' 
First strand cDNA prepared from flower buds was used 
35 as template in nested PCR reactions. The first reaction was 
carried using the pair of oligos 1 and 4, and the second 
reaction used oligos 2 and 3. PCR conditions were as 
described (Ferreira et al . , 1991, Plant Cell 3, 531-540), 
except that the annealing temperature of the first reaction 
40 was 45 C, and for the second reaction, 37 C was used. A 
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fragment of approximately 300 bp was e luted from agarose gel 
and cloned in pGEM-T. Out of 16 clones sequenced, two showed 
high homology to published CDC27 sequences (Sikorski et al . , 
19 91 Cold Spring Harbor Symposia on Quantitative Biology vol 
5 LVT, 663-673 , 1991) . This fragment was then used to screen a 
lambda gtlO cDNA library prepared from total Arabidopsis 
plants. The isolated target cDNA, approximately 2,5 kb, was 
completely sequenced by the dideoxy method and is shown in 
fig 4 and in SEQ ID nr 9. A combination of restriction 

10 enzymes and oligonucleotide subcloning was used to produce 
the templates for sequencing. 

The Arabidopsis CDC27 cDNA contains one open reading 
frame, encoding a polypeptide of 72 8 amino acids (figure 4) . 
With the SRS search program, the databanks EMBL and EMBL new 

15 were screened for gene sequences, homologous to the present 
CDC 2 7 cDNA sequence. A genomic sequence from Arabidopsis 
thaliana (accession number AC001645) was found, comprising 
14 exons, coding for a protein of 728 AA. With the GCG- 
package version 9.1, the present cDNA-sequence was compared "< 

20 with the said genomic Arabidopsis sequence (1) using the 
"best fit " -program. It appeared that the present cDNA 
comprised additional coding information for two novel exons, 
namely the second and last exon of the Arabidopsis CDC2 7- 
gene (exons 2 and 16 in fig 4) . 

25 The amino acid sequences encoded by the second. and 

last exon are depicted in SEQ ID NOS 6 and 7 respectively. 



Example 3 DOMINANT NEGATIVE MUTANTS OF CDC7 

3 0 Dominant negative mutants of CDC 7 (CDC7 DN) are constructed 

by creating substitution mutations including amino acid 
residues 1(G), 5 (V) , 18(A) and 20 (K) of SEQ ID No2 ; amino 
acid residues 13 (T) , 16 (F) , 18(A) and 20(E) of SEQ ID No3 ; 
amino acid residues 7(L) and 18 (K) of SEQ ID No4 . 
35 Substitutions are not conservative. Expression of a CDC 7 DN 
in a whole plant, a plant tissue, a plant organ or a plant 
cell results in cell cycle arrest at Gl/S. These results are 
in line with the situation in yeast, wherein one such 
substitution, threonine 13 of SEQ ID No 3 (position 722 in 

4 0 SEQ ID No 1) to a glutamate has proven to create a dominant 
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negative CDC7 in yeast- This CDC7 DN is inactive as a kinase 
but can still bind DBF4 , thus preventing activation of wild- 
type CDC 7 molecules (Ohtoshi et al . 1997, Mol Gen Genet 254, 
562-570) . 

The CDC7 DN mutants can be obtained by site-directed 
mutagenesis using the ExSite PCR-based site-directed 
mutagenesis kit (Stratagene, La Jolla, CA) . Fidelity of the 
mutagenesis are confirmed by sequencing. 

Example 4 MUTANTS OF CDC27 

Several types of CDC27 muteins can be considered: 

(1) Insertion of an amino acid such as proline (P) in the 
amino acid sequence SEQ ID No 7, e.g. behind the 
tyrosine (Y) residue leads to a loss-of -function of 
the APC. It is believed that such an insertion deforms 
the predicted (a-helix of the novel TPR-like domain of 
which SEQ ID No 7 is part and causes a disturbance of 
the overall three-dimensional structure of CDC27 , 
therewith titrating out functional proteins of the 
APC, such as CDC16 or CDC 23, leading to loss of APC 
function. In line with these results, altering the 
a-helix structure in one of the TPR units of yeast 

r CDC27 has been proven, and of any of the TPR units has 
been hypothesized, to destroy CDC27 function (Lamb et 
al. 1984, EMBO J. 13, 4321-4328). 

(2) Deletion of the NH2-terminal 200 to 220 amino acids of 
CDC27 also leads to loss of function of the APC by 
titrating out molecules such as APC substrates or APC 
regulators. This domain encompasses the conserved 
amino acid sequence SEQ ID No 6 as well as the first 
TPR unit of CDC27. Deletion of this sequence in human 
CDC27 abrogates binding of e.g. CDC16, but not of that 
of e.g. PP5, an APC regulator protein (Ollendorf and 
Donoghue 1997, J Biol Chem 272, 32011-32018). 

(3) CDC27 muteins consisting of the conserved NH2 -terminal 
domain (containing SEQ ID No6) and 1, 2 or more of the 
downstream TPR units. 

(4) CDC27 muteins consisting of the novel TPR-like domain 
(ending with SEQ ID No7) preceded by 1, 2 or more of 
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the upstream TPR units. 
Muteins described in (3) and (4) act as those described in 

(1) or (2) . 

The point mutants in (1) are obtained by site-directed 
5 mutagenesis using the ExSite PCR-based site-directed 

mutagenesis kit (Stratagene, La Jolla, CA) . Fidelity of the 
mutagenesis are confirmed by sequencing. Deletion mutants in 

(2) , (3) and (4) are obtained by high-fidelity PCR (Expand 
High Fidelity PCR System, Boehringer, Mannheim) using 

10 primers designed to amplify the desired stretches of the 
CDC27 nucleotide sequence. Primers include extensions 
recognized by restriction endonucleases to allow easy 
cloning in a vector such as pUC18. Amplified sequences are 
checked by nucleotide sequence determination. 

15 Expressing such CDC27 muteins in a whole plant, a plant 
tissue, a plant organ or a plant cell will cause 
malfunctioning of the APC and thus repetitive cycles of DNA 
synthesis without intervening mitosis. This 
endoreduplication results in a polyploid phenotype . 

20 

Example 5 NEMATODE RESISTANCE - CDC7 DN 

In order to obtain nematode resistance, the CDC7 DN coding 

25 sequence is operably linked to a plant promoter responsive 
to nematode infection and to the NOS polyadenylation site. 
The ARM1 or Att0728 promoters can be used (Barthels et al. 
1997, Plant Cell 9, 2119-2134) . The CDC7 DN expression 
cassette is subsequently transferred to a binary vector such 

30 as pGSC1704 and the resulting vector electroporated into 

Agrrobacterium tumefaciens C58ClRifR (pGV2260) . Transf ormants 
are selected on streptomycin/spectinomycin containing medium 
and checked for the presence of the integral transformed 
binary vector. Arabidopsis tha.lia.nsi Col-0 is transformed 

35 with the selected A. tumefaciens strain by the floral dip 
method (Clough and Bent 1998, Plant J 16, 735-743) . 
Transgenic plants are selected after seed germination in the 
presence of hygromycin. Selected transgenic lines and 
untransf ormed control lines are infected with root knot or 

40 cyst nematodes. Successf ulness of infection is scored 
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15 promoter such as 0sg6B (Tsuchiya et al. 1995, Plant Cell 
Physiol 36, 487-494) and to a NOS polyadenylation site will 
result in a suitable expression cassette. Introduction of 
this cassette into A. thaliana is done as described in 
example 5. Selected transformant lines have a reduced and/or 

20 abnormal pollen formation/development. This is assessed 
using microscopic methods. 

Example 7 - ENDOREDUPLICATION - .CDC27 muteins : 

25 

Any of the muteins are operably linked to a constitutive 
promoter such as the CaMV 35S promoter (Kay et al. 1987, 
Science 236, 1299-1302) or to a seed endosperm- specific 
promoter such as from a 2S albumin seed storage protein 

30 (Guerche et al. 1990, Plant Cell 2, 469-478) or to the BL22 
promoter (Carbonero et al, 1999 in press) and to a 
polyadenylation signal. Such expression cassettes are 
transferred to A. thaliana as described in example 5. 
Selected transformant lines have a general higher rate of 

35 endoreduplicating cells (CaMV 35S promoter) and/or produce 
seeds with a higher amount of polyploid endosperm cells (2S 
albumin promoter) . Endoreduplication or polyploidism is 
assessed in several ways. 

(1) Confocal microscopy is applied to measure the nuclear 
40 diameter. Polyploid cells normally have enlarged 
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nuclei in order to harbor the increased DNA content. 
(2) The DNA content of plant cells is measured by flow 
cytometry (Galbraith et al . 1991 , Plant Physiol 96, 
985-989) . 

5 (3) The cyclin B-degrading activity of the APC is 

determined as described by King et al . (1995, Cell 91 , 
279-288) . 
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SEQUENCE LISTING 
(1) GENERAL INFORMATION: 

5 (i) APPLICANT: 

(A) NAME : CropDesign NV 

(B) STREET: Technologiepark 3 

(C) CITY: Zwijnaarde-Gent 

(D) STATE: none 
10 (E) COUNTRY: Belgium 

(F) POSTAL CODE (ZIP) : 9052 



15 



25 



30 



45 



50 



(ii) TITLE OF INVENTION: Plant DNA 
replication modulating proteins 

(iii) NUMBER OF SEQUENCES: 9 



(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy Disk 
2 0 (B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Wordperfect 5.2 



(2) INFORMATION FOR SEQ ID NO: 1; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 889 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: No 

3 5 (ix) FEATURE: 

(A) NAME /KEY : CDC7 

(B) LOCATION: 

(xi) SEQUENCE DESCRIPTION: 

40 1 MSENSEPRQL ENS T AGRE L I PLSPTNSDGN DDLNYHLHAF ELSRLLLSSG 

51 HPESVIDLSS KCTYFQGSPN LVKYLCSIPN SPISLAEDGF TVTLSPESPS 

101 APASFACSLD LQENWLEQF MDPRSLTLKH SRENAEQEEL ELMPLPKRSR 

151 NDGNDVNYSV IDSRPNDIRT VACGTMLGT I LALESQASVF NLSASNRGIE 

2 01 AFVQDHQPGP QTSNASVDVN PTHRLEESKN DLPSPQEDGY YERPEIGDFQ 
55 251 IADNQILIEE GDDKNKKDLF PKGEIQTDSV QSDPVASLMP TENELEPVQI 

3 01 VDDTEDLLVD DHTVDIVSTP DRELPLKPSA TEANQDKSLV QKTLDQCKLP 
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351 GNSKTYSCSP E I KHTRKS KV I QKRKQNFNT VRLKDQKDQA KHNT I PDFDS 
401 YTIVEEEGSG GYGIVYKATR KTDGTEFAIK CPHVGAQKYY VNNEIRMLER 

5 

451 FGGKNCIIKH EGCLKNGDSD CIILEHLEHD RPDSLKRE I D VYQLQWYGYC 
10 501 MFKALSSLHK QGWHRDVKP GNFLF SRKTN KGYLIDFNLA MDLHQKYRRA 



15 



551 DKSKAASGLP TAS KKHHTLV KSLDAVNRGT NKP SQKTLAP NSIKKAAGKT 
601 RARNDMTRWE RLNSQGAEGS GLTSAKDVTS TRNNPSGEKR REPLPCHGRK 



20 



651 ALLDFLQETM SVPIPNHEVS SKAPTSMRKR VAALPGKAEK ELLYLTPMPL 
701 CSNGRPEAGD VIEKKDGPCS GTKGFRAPEV CFRSLHQGPK IDVWSAGVTL 



25 751 LYLIMGRTPF TGDPEQNIKD IAQLRGSEEL WEVAKLiHNRE SSFPKELYES 



30 



801 RYLKGMELRK WCELNTKRRE FLDVIPLSLL DLVDKCLTVN PRRRISAEDA 



851 LKHDFFHPVH ETLRNQMLLK QQPTWADAV SQTLNYLQL 



35 



40 



45 



(2) INFORMATION FOR SEQ ID NO: 2; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(xi) SEQUENCE DESCRIPTION: 
GYGI VYKATRKTDGTEFAI K 



10 
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(2) INFORMATION FOR SEQ ID NO: 3; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: 
DVIEKKDGPCSGTKGFRAPE 
(2) INFORMATION FOR SEQ ID NO: 4; 

15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 amino acids 

(B) TYPE: amino acid 

(D) TOPOLOGY: linear 

20 (ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: 

NIKDIAQLRGSEELWEVAKLHNRESSFPK 

25 



30 



(2) INFORMATION FOR SEQ ID NO: 5; 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 72 8 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNESS : 

35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

» 

(ix) FEATURE: 
40 (A) NAME/KEY: CDC27 

(B) LOCATION: 

(xi) SEQUENCE DESCRIPTION: 
Seq id no 5 

4 5 

1 MMENLLANCV QKNLNHFMFT NAI FLCELLL AQFPSEVNLQ LLARCYLSNS 

51 QAYSAYYILK GSKTPQSRYL FAFSCFKLDL LGEAEAALLP CEDYAEEVPG 

50 

101 GAAGHYLLGL IYRYSGRKNC SIQQFRMALS FDPLCWEAYG ELCSLGAAEE 
55 151 ASTVFGNVAS QRLQKTCVEQ RISFSEGATI DQITDSDKAL KDTGLSQTEH 
201 IPGENQQDLK IMQQPGDIPP NTDRQL STNG WDLNTPSPVL LQVMDALPPL 
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251 LLKNMRRPAV EGSLMSVHGV RVRRRNFFSE ELSAEAQEES GRRRSARIAA 

3 01 RKKNPMSQSF GKDSHWLHLS PSESNYAPSL SSMIGKCRIQ SSKEVIPDTV 

5 

351 TLNDPATTSG QSVSDIGSSV DDEEKSNPSE SSPDRFSLIS GISEVLSLLK 

10 4 01 ILGDGHRHLH MYKCQEALLA YQKLSQKQYN THWVLMQVGK AYFELQDYFN 

4 51 ADSSFTLAHQ KYPYALEGMD TYSTVLYHLK EEMRLGYLAQ ELISVDRLSP 

15 

501 ESWCAVGNCY SLRKDHDTAL KMFQRAIQLN ERFTYAHTLC GHEFAALEEF 

551 EDAERCYRKA LGIDTRHYNA WYGLGMTYLR QEKFEFAQHQ FQliALQINPR 

20 

6 01 SSVIMCYYGI ALHESKRNDE ALMMMEKAVL TDAKNPLPKY YKAHILTSLG 

25 651 DYHKAQKVLE ELKECAPQES SVHASLGKIY NQLKQYDKAV LHFGIALDLS 

701 PSPSDAVKIK AYMERLILPD ELVTEENL 



30 
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(2) INFORMATION FOR SEQ ID NO: 6; 



(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 24 amino acids 

(B) TYPE: amino acid 
( D ) TOPOLOGY : 1 inear 



(ii) MOLECULE TYPE: peptide 
(xi) SEQUENCE DESCRIPTION: 

VNLQLLARCYLSNSQAYSAYYILK 

(2) INFORMATION FOR SEQ ID NO: 7; 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acids 
50 (B) TYPE: amino acid 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

55 (xi) SEQUENCE DESCRIPTION ID 7: 

AYMERL I LPDE LVTEENL 
(2) INFORMATION FOR SEQ ID NO: 8; 



jfHiii 1 .«!!- . — aim: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2670 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: No 

(ix) FEATURE: 

(A) NAME /KEY : CDC7 

(B) LOCATION: 

(xi) SEQUENCE DESCRIPTION: 
ATGTCAGAAA ACTCGGAACC GCGTCAACTC GAGAATTCTA CAGCCGGAAG 



2 0 51 AGAGCTCATT CCTCTTAGTC CCACCAATTC AGACGGCAAC GACGACCTTA 



25 



30 



101 ACTATCATCT GCATGCTTTT GAGTTATCTC GTCTCCTACT TTCTTCTGGT 

151 CATCCAGAAT CTGTTATAGA TCTTTCTTCA AAGTGTACAT ACTTCCAAGG 

2 01 TTCTCCTAAT CTCGTCAAAT ATCTTTGCTC GATCCCTAAT TCTCCTATTT 

251 CCCTTGCCGA AGATGGCTTC ACTGTGACTC TCTCGCCTGA GTCTCCCTCC 



35 3 01 GCTCCGGCTA GTTTCGCCTG TAGTTTGGAT TTGCAGGAAA ATGTTGTGTT 



40 



45 



351 AGAACAGTTT ATGGATCCGA GATCTCTCAC GCTAAAGCAT TCGAGAGAGA 

401 ATGCGGAACA AGAGGAGCTA GAGCTCATGC CATTGCCCAA AAGAAGTCGA 

451 AATGATGGAA ACGATGTGAA TTACTCTGTA ATAGATAGCA GACCTAACGA 

501 CATCAGAACT GTTGCCTGTG GAACTATGCT TGGGACTATT TTAGCTCTTG 



50 551 AATCCCAAGC TTCGGTTTTC AATTTAAGTG CATCTAACCG AGGAATAGAG 



55 



601 GCTTTTGTTC AAGATCATCA GCCTGGTCCG CAGACATCCA ATGCTTCAGT 
651 GGATGTCAAT CCTACACATC GGTTAGAGGA AAGCAAGAAC GATTTGCCAT 
701 CTCCTCAGGA GGATGGATAT TACG AG CG AC CTGAAATTGG AGATTTCCAA 



46 
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ATTGCTGACA ACCAAATATT AATCGAAGAA GGTGATGATA AAAATAAGAA 
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8 01 GGATCTCTTC CCTAAGGGAG AGATACAAAC TGATTCTGTG CAGTCCGATC 

851 CCGTTGCCTC ATTGATGCCA ACAGAAAATG AGTTAGAACC AGTGCAGATT 

901 GTGGATGACA CTGAAGATCT ACTTGTAGAT GATCACACTG TAGACATCGT 

951 TAGCACCCCT GACAGAGAGC TGCCGTTGAA GCCTTCTGCT ACAGAAGCTA 

1001 ATCAAGATAA ATCTTTGGTA CAAAAAACTC TGGATCAATG CAAATTGCCG 

1051 GGAAACAGCA AAACGTACAG CTGTTCCCCT GAGATAAAAC ACACCAGAAA 

1101 AAGTAAAGTT ATCCAGAAGA GGAAGCAGAA TTTTAACACC GTTCGTCTTA 

1151 AAGATCAGAA GGATCAGGCA AAGCATAACA CAATTCCAGA TTTTGATTCT 

12 01 TACACTATTG TAGAGGAAGA AGGTTCAGGT GGCTACGGGA TTGTTTATAA 
1251 GGCAACGAGG AAAACTGATG GAACAGAGTT TGCAATTAAA TGCCCTCATG 

13 01 TTGGCGCTCA GAAGTATTAT GTGAATAATG AAATCAGAAT GCTGGAGCGT 
1351 TTTGGGGGGA AAAACTGTAT AATAAAGCAT GAAGGCTGTC T CAAGAATGG 

14 01 AGATTCTGAT TGCATCATCC TTGAGCACCT TGAACATGAC AGACCTGATT 
14 51 CATTGAAGAG AGAAATAGAT GTGTATCAGC TGCAGTGGTA CGGCTACTGC 
1501 ATGTTCAAAG CTCTATCGAG TCTGCATAAG CAGGGTGTTG TTCATAGGGA 
1551 TGTTAAGCCA GGAAACTTCC TCTTCTCTAG GAAGACCAAC AAAGGCTATC 
16 01 TCATTGATTT TAACCTTGCC ATGGATTTGC ACCAGAAGTA CAGAAGAGCA 
16 51 GATAAATCAA AAGCAGCTTC AGGTCTTCCT ACCGCCAGCA AGAAACATCA 
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1701 TACATTGGTT AAATCACTCG ATGCGGTAAA CCGAGGGACC AACAAACCTT 

1751 CTCAGAAAAC TTTAGCGCCT AATAGTATCA AGAAAGCAGC GGGAAAGACA 

5 

1801 AGAGCTCGGA ATGACATGAC CAGATGGGAG AGACTCAATA GCCAAGGGGC 

10 1851 AGAAGGGTCT GGCTTAACTT CAGCTAAAGA TGTGACCAGC ACAAGGAACA 

1901 ACCCTTCAGG TGAAAAGAGA AGAGAGCCTT TGCCATGTCA TGGAAGAAAA 

15 

1951 GCGCTTTTAG ATTTTCTGCA AGAGACAATG TCTGTTCCAA TTCCAAACCA 

2001 TGAAGTATCA TCCAAAGCTC CTACGTCTAT GAGAAAACGG GTAGCTGCTC 

20 

2 051 TTCCAGGGAA AGCTGAGAAG GAACTTCTTT ATCTGACCCC AATGCCACTG 

25 2101 TGCTCTAACG GTCGGCCTGA AGCAGGGGAC GTAATTGAGA AGAAAGACGG 

2151 TCCTTGCTCA GGAACCAAAG GCTTCCGAGC TCCAGAGGTT TGCTTCAGAT 

30 

22 01 CTTTGCACCA AGGACCTAAG ATAGACGTGT GGTCTGCGGG AGTTACTTTG 

2251 TTATACCTCA TAATGGGAAG GACACCTTTC ACTGGTGACC CTGAACAGAA 

35 

2301 CATAAAGGAC ATTGCACAAC TACGAGGCAG TGAAGAATTA TGGGAAGTAG 

4 0 23 51 CCAAGCTGCA CAACCGTGAA TCCTCTTTCC CTAAGGAATT ATACGAGTCA 

24 01 AGGTACTTGA AGGGGATGGA GTTGAGAAAA TGGTGCGAAC TCAACACAAA 

45 

2451 ACGCAGAGAG TTTCTAGACG TAATTCCACT ATCGCTTCTT GACCTCGTTG 

2 501 ATAAATGTTT GACCGTTAAC CCGAGGCGAC GAATCAGCGC AGAGGATGCT 

50 

2551 CTCAAGCACG ACTTCTTCCA TCCAGTACAT GAAACCCTTA GAAACCAAAT 

55 2 601 GCTCCTTAAA CAGCAGCCTA CAGTGGTTGC TGACGCAGTA AGCCAAACTC 

2651 TAAACTATTT ACAATTGTAA 



48 
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(2) INFORMATION FOR SEQ ID NO: 9; 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 2187 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: No 

( ix) FEATURE : 
15 (A) NAME/KEY: CDC2 7 

(B) LOCATION: 



20 



(xi) SEQUENCE DESCRIPTION: 

1 ATGATGGAGA ATCTACTGGC GAATTGTGTC CAGAAAAACC TTAACCATTT 

51 TATGTTCACC AATGCTATCT TCCTTTGCGA ACTTCTTCTC GCCCAATTTC 

25 

101 CATCTGAGGT GAACCTGCAA TTGTTAGCCA GGTGTTACTT GAGTAACAGT 

30 151 CAAGCTTATA GTGCATATTA TATCCTTAAA GGTTCAAAAA CGCCTCAGTC 

201 TCGGTATTTA TTTGCATTCT CATGCTTTAA GTTGGATCTT CTTGGAGAGG 

35 

251 CTGAAGCTGC ATTGTTGCCC TGTGAAGATT ATGCTGAAGA AGTTCCTGGT 

3 01 GGTGCAGCTG GGCATTATCT TCTTGGTCTT ATATATAGAT ATTCTGGGAG 

40 

351 GAAGAACTGT TCAATACAAC AGTTTAGGAT GGCATTGTCA TTTGATCCAT 

4 5 401 TGTGTTGGGA AGCATATGGA GAACTTTGTA GTTTAGGTGC CGCTGAAGAA 

4 51 GCCTCAACAG TTTTCGGGAA TGTTGCTTCC CAGCGTCTTC AGAAAACTTG 

50 

501 TGTAGAACAA AGAATAAGCT TCTCAGAAGG AGCAACCATA GACCAGATTA 

551 CAGATTCTGA TAAGGCCTTA AAAGATACAG GTTTATCGCA AACAGAACAC 

55 

6 01 ATTCCAGGAG AGAACCAACA AGATCTGAAA ATTATGCAGC AGCCTGGAGA 
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651 TATTCCACCA AATACTGACA GGCAACTTAG TACAAACGGA TGGGACTTGA 

701 ACACACCTTC TCCAGTGCTT TTACAGGTAA TGGATGCTCT ACCGCCTCTG 

5 

751 CTTCTTAAGA ATATGCGTCG TCCAGCAGTG GAAGGATCTT TGATGTCTGT 

10 801 ACATGGAGTG CGTGTGCGTC GAAGAAACTT TTTTAGTGAA GAATTGTCAG 

851 CAGAGGCTCA AGAAGAATCT GGGCGCCGCC GTAGTGCTAG AATAGCAGCA 

15 

901 AGGAAAAAGA ATCCTATGTC GCAGTCATTT GGAAAAGATT CCCATTGGTT 

951 ACATCTTTCA CCTTCCGAGT CAAACTATGC ACCTTCTCTT TCCTCGATGA 

20 

1001 TTGGAAAATG CAGAATCCAA AGCAGCAAAG AAGTGATTCC TGATACCGTT 

25 1051 ACTCTAAATG ATCCAGCAAC GACGTCAGGC CAGTCTGTAA GTGACATTGG 

1101 AAGCTCTGTT GATGATGAGG AAAAGTCAAA TCCTAGTGAA TCTTCCCCGG 

30 

1151 ATCGTTTCAG CCTTATTTCT GGAATTTCAG AAGTGCTAAG CCTTCTGAAA 

1201 ATTCTTGGAG ATGGCCACAG GCATTTACAT ATGTACAAGT GTCAGGAAGC 

35 

1251 TTTGTTGGCA TATCAAAAGC TATCTCAGAA ACAATACAAT ACACACTGGG 

4 0 13 01 TTCTCATGCA GGTTGGAAAA GCATATTTTG AGCTACAAGA CTACTTCAAC 

1351 GCTGACTCTT CCTTTACTCT TGCTCATCAA AAGTATCCTT ATGCTTTGGA 

45 

14 01 AGGAATGGAT ACATACTCCA CTGTTCTTTA TCACCTGAAA GAAGAGATGA 

14 51 GGTTGGGCTA TCTGGCTCAG GAACTGATTT CAGTTGATCG CCTGTCTCCA 

50 

1501 GAAT C CTGGT GTGCAGTTGG GAACTGTTAC AGTTTGCGTA AGGATCATGA 

55 1551 TACTGCTCTC AAAATGTTTC AGAGAGCTAT CCAACTGAAT GAAAGATTCA 

16 01 CATATGCACA TACCCTTTGT GGCCACGAGT TTGCCGCATT GGAAGAATTC 



50 
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1651 GAGGATGCAG AGAGATGCTA 

1701 CTATAATGCA TGGTACGGTC 

5 

1751 TCGAGTTTGC GCAGCATCAA 

10 1801 TCTTCAGTCA TCATGTGTTA 

1851 AAACGATGAG GCGTTGATGA 

15 

1901 AGAATCCGCT CCCCAAGTAC 

1951 GATTATCACA AAGCACAGAA 

20 

2001 TCAAGAAAGC AGTGTCCATG 

25 2051 AGCAATACGA CAAAGC CGTG 

2101 CCTTCTCCAT CTGATGCTGT 

30 

2151 ACTACCAGAC GAGCTGGTGA 



51 

CCGGAAGGCT CTGGGCATAG ATACGAGACA 
TTGGAATGAC CTATCTTCGT CAGGAGAAAT 
TTTCAACTGG CTCTCCAAAT AAATCCAAGA 
CTATGGAATT GCTTTGCATG AGTCAAAGAG- 
TGATGGAGAA GGCTGTACTC ACTGATGCAA 
TACAAGGCTC ACATATTAAC CAGCCTAGGT 
AGTTTTAGAA GAGCTCAAAG AATGTGCTCC 
CATCGCTTGG CAAAATATAC AATCAGCTAA 
TTACATTTCG GCATTGCTTT GGATTTAAGC 
CAAGATAAAG GCTTACATGG AGAGGTTGAT 
CGGAGGAAAA TTTGTAG 
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1. At least partially purified protein, capable of 
modulating DNA replication in plants, at least comprising in 

5 the amino acid sequence 

a) one or more of the amino acid sequences chosen from the 

group consisting of those, given by SEQ ID NOS 2, 3 
and 4 , 

b) one or more of the amino acid sequences chosen from the 
10 group consisting of those, given by SEQ ID NOS 6 and 

7, 

c) one or more amino acid sequences having at least 50% 
amino acid identity with those of a) , or 

d) one or more amino acid sequences having at least 50% 
15 amino acid identity with those of b) . 

2. Protein according to claim 1, comprising one or more 
of the amino acid sequences according to c) or d) , the 
respective amino acid identity being at least 90%. 

20 

3. Protein according to claim 1 or 2, having the amino 
acid sequence as given in SEQ ID 1 or no 5, or having at least 
80% amino acid identity with one of the said sequences. 



25 4. Protein according .to one or more of claims 1-3, being 

a plant CDC7 protein or a functional analogue thereof. 

5. Protein according to one or more claims 1-3 , being 
a plant CDC27 protein or a functional analogue thereof. 

30 
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6 . Mutein of a protein according to one or more of the 
preceding claims, comprising at least one amino acid 
substitution, deletion or addition, affecting the DNA" 
replicative effect of the said protein. 

7. Mutein according to claim 6, wherein at least one of 
the phosphorylatable amino acids are deleted or substituted by 
one or more non-phosphorylatable amino acids. 



10 8. Peptide, comprising 

a) one or more of the amino acid sequences chosen from the 

group consisting of those given by SEQ ID No 2, 3 
and 4, 

b) one or more of the amino acid sequences chosen from the 
15 group consisting of those, given by SEQ ID NOS 6 and 

7, 

c) one or more amino acid sequences having at least 50% 
amino acid identity with those of a), or 

d) one or more amino acid sequences having at least 50% 
20 amino acid identity with those of b) . 

9. Antibody, specifically recognizing a protein 
according to any of the claims 1-5, a mutein according to any 
of the claims 6-7 or a peptide according to claim 8. 

25 

10. Antibody according to claim 9, being at least 
partially purified. 



11. Non-genomic DNA sequence coding for a protein 
3 0 according to one or more of claims 1-5, for a mutein according 



54 

to claim 6 or 7, or for a peptide according to claim 8, or DNA 
sequence having a sequence homology of at least 75% of the said 
sequence or the complementary DNA sequence thereof . 

5 12. DNA sequence according to claim 11, being 

substantially free of sequences intervening the coding 
sequence - 

13. DNA sequence according to claim 11 or 12, comprising 
10 the DNA sequence as given by SEQ ID no 8 or SEQ ID no 9 or 

having a sequence homology with SEQ ID no 8 or SEQ ID no 9 of 
at least 75% or the complementary sequence thereof. 

14. DNA sequence, coding for a peptide according to claim 
15 8, corresponding to nucleotides 1229-1291, 2126-2187 or 2298- 

2385 of SEQ ID No 8, or to nucleotides 109-181 or 2128-2181 of 
SEQ ID No 9, or a DNA sequence, having a sequence homology of 
at least 75% to the said sequence or the complementary sequence 
thereof . 

20 

15. DNA vector, at least comprising the DNA sequence 
according to one of the claims 11-14. 

16. DNA vector according to claim 15, further comprising 
25 a promoter, functional in plant cells, operably linked to the 

DNA sequence according to one of the claims 11-14. 

17. DNA vector according to claim 15 or 16 comprising DNA 
coding for a mutein according to claim 6 or 7, operably linked 

30 to a nematode -induced promoter, functional in plant cells. 
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18. Method for modulating DNA replication in plant cells, 
plant parts or plants by conferring to one or more plant cells 
the capacity to provide a protein according to one or more of 
claims 1-5 , or a mutein thereof according to claim 6 or 7, in 
5 an amount sufficient to modulate DNA replication and/or to 
block mitosis of the said cells. 



19. Method according to claim 18, wherein the said 
capacity is conferred to one or more plant cells, by 

10 a) transforming one or more plant cells with DNA 

according to one of the claims 9-12 or with a DNA 
vector according to one of the claims 13-15, 
b) culturing the plant cells in order to regenerate 
plant parts or plants from the transformed cells, or 

15 c) incubating the cells, plant parts or plants at 

conditions allowing expression of the said ; DNA to 
produce the said protein or a mutein. 

20. Method according to claim 18 or 19 for the generation 
20 of polyploid plant cells, plant parts or plants. 



21. Method for identifying and/or obtaining proteins 
capable of modulating the DNA replication in plants, comprising 
a two-hybrid screening assay, using CDC27 or CDC7 
25 polynucleotide sequences as a bait and a cDNA library or of a 
cell suspension culture as a prey. 



22. Method for the production of transgenic plants, plant 
cells or plant tissue, comprising the introduction of a nucleic 
30 acid molecule according to any of the claims 11-14 or a vector 
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according to claim 15 or 16 into the genome of said plant, 
plant cell or plant tissue. 

23. Plant cell, transformed with a vector according to 
5 one of the claims 15-16, or comprising the DNA according to one 
of the claims 11-14. 



10 



24. Plant, obtainable by the method according to one or 
more of claims 18-19. 

25. Progeny of a plant according to claim 24. 



26. Plant material such as roots, flowers, fruit, leaves, 
pollen, seeds, seedlings or tubers, obtainable from a plant 
15 according to claim 24 or 25 . 
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The present invention relates to at least partially 



5 purified protein, capable of modulating the DNA replication in 
plants, muteins thereof, DNA coding therefor and to a method 
to confer to one or more plant cells the capacity to provide 
such a protein or mutein. The invention also relates to plants, 
comprising the said DNA and the progeny thereof. 
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Fig. 3 



IaTGTCAGAAAACTCGGAACCGCGTC^CTC 
TCTAAGTGTCGTAAACGTTACAGI'LTIU'TTG^ 

Imsenseprqlestst 

GCCGGAAGAGAGCXCATTCCT CTTAGT C CCACCAATTCAGACGGCAACGACGAC CTTAAC 
CGGCCTTCTCTCGAGXAAGGAGAATCAGw 

A G R E I* I. P EiS PTtTSDGNDDLIT 
TAICAXCTGCATGCTTTTGAGTTAT 

ATAGTAGACGTACGAAAACT CAAXAGAGC^GAGGATGAA&jGAAGAC CAGTAGGTCTTAGA 

THIxHAEEIiSRLLLS SGHPES 

GTTAIAGAX CTTTCTTCAAAGTGTAGAXACTTC CAAGGTT CT C CTAATCT CGTCAAATAT 

s h 1 + +. + 240 

CAATATCTAGAAAGAAGTTTCACATGTATGAAGGTT^ 

VIDL.SSKCTYFQGSPNLVK Y 

CTTTGCT CGAT C C CTAATT CTC CTATTTC C CTTGC C GAAGATGGCZTTCACTGTGACTCT C 

-i +. * h -f- + 3 00 

GAA&CGAGCTAGGGATTAAGAGGATAAAGGGAACGGCTT CTAC CGAAGTGACACTGAGAG 

LCS IPNSP ISLAEDGFTVTL 

TCGCCTGAGTCTCCCTCCGCTCCGGCTAGTTTCGCCTGTAGTTTGGATTTGCAGG 

301 n h + 360 

AGCGGACTCAGAGGGAGGCGAGG C CGATCAAAGCGGACATCAAACCZTAAACGTC CTTTTA 

SPESPSAPASFACSLDLQEN 

361. h h h 1 , + 420 

GAACACAAT uxr IjX CAAATAC CTAGGCTCTAGAGAGTGCGATTT CGTAAGCT CTCl' CLTA 

vvleqfmdprsltlkhsreIn 

GCGGAACAAGAGGAGCTAGAGCTCATGC CATTGC C CAAAAGAAGTCGAAATGATGGAAAC 

421 + + + H h + 480 

CGCCTTGTT CTCCTCGATCTCGAGTACGG f IAACGG G ' r , l"l!T CTT CAGCTTTACTACCTTTG 

aeqeelelmplpkrsrndgn 
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4ar — +■ ' ; ' ■ : 540 

J- • 1 : ' ff0 ° 

ricnGxrriLA.nES'Q.jLSVFBrnsA. 

TCZTAACCGAGGAAIAGAGGCTTTT 

g-QX J i h 1 ■ 

AGATTGGCTCTTrr^CT^ 

SHILG-rE.AF'VaDH'-QBG'SQTSNr 
GCTTCAGTGGATGTCtUVrC^ 

h i 1 1 IZT~~ 720 

A.SVDVHBTHELIiErESSCH'DLSS 
CCTCAGGAGGATGGATATT&C 



72i 



780 



GGAGTCCTCCXACCT&TAATGCTC 
CAAATATTAAXCGAAGAAGGTGATGAXAAi^^ 



781 



840 



GTTTATAATTAGClTCrrCC^^ 
QrLrEEGDDJCNTKKDLFBKGE 
ATACAAACTGATTCTGTGCAGTC^ 

IQTDSVQS D P\TAS L M 2 T E £ 
TTAGAAC CAGTGCAGATTGTGGAT GACACTGAAGAT CTACTTGTAGATGATCACACTGTA 
* AATCTTGGTCAClixCIlAACAC 

L E 2 V ' Q rVDD TTEDDIaVDDETV 
GACAXCGTTAGCACC: C CTGACAGAGAGCTGC CGTTGAAGCCTTCTGCTACAGAAGCTAAT 
CTGTAGCAATCGTGGGGACTGTCTCTCG^ 

DrVSTPDRELPLICPSATEAN" 
CAAGATAAATCTTTGGTACAAAAAACT CTGGATCAAIGCAAATTGC CGGGAAACAGCAAA 
GTT CTA T TTAGAAAC CATCi'l"!"!"!"!' '1' GAGAC CTAGTTACLy TAACGGC Z LJT'IT UT Ciix r t 
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ACGTACZUaCXGTTCCCC^^ 

rosr ( i — ; 1 1 « 

3JL4.X 1 1 H h * ^00 

314 



If*" 



12.0 T — -H- 1 1 1 +- 12 sa 

'EAAGGTCXAAAACTAAGAATGTGATAAG^CT 

GTTTAIAAGGCAACGAGGAA&ACT^ 

12.6-3. h - + - — h- 1 1 * 1320 

CJU^iVIATTCCGTTGCTCCirrTTGAC^ 

Y K A T R EC T D GTE E A T KT | C P H V* 

6|? 

GGCGCTCAGAAGTATTATGTGAATAATGA&AT CAGAArGCTGGAGCGTTTTGGjGGGGAAA 

132X +■ +■ "TT ~'Z— 13 80 

CCGCGAGTCTICATAATACACTTATTAC^^ 

GAQ.KYYTV"N £T E T R M I, ERF G | C- iC 



AACTGTATAATAAAGCATGAAGGCTGT CT CAAGAATGGAGATTCTGATTGCATCATCCTT 
TTGACATATTA!XTrCGTACTrrCCGACAGAGTT CTTAC CTCTAAGACTAACGTAGTAGGAA 
BTCrrKHE'GCIi.XETGD SDC 2T X Ij 
GAGCACCTTGAACATGACAGAC CTGATTC^TTGAAGAGAGAAATAGATGTGTATGAGCTG 
CrCGTGGAACTTGTACTGTCTGGACIAAGT^^ 



EHLEHDRPDSLKRErD^TYQL 

CAGTGGTACGGCTACTGCATGTTCAAAGC^ 

150L k h «■ * ^ 1550 

GTCAC CATGCCGATGACGTACAAGTTTCGAGATAGCTCAGACGTATT C GTC CCACAAGAA 

718 

QWXG Y CM F KAIi S S Ei H"KQ|GVV 
CATAGGGATGTTAAGC CAGGAAACTTC CT CTT CTCTAGGAAGACCAACAAAGGCTATCTC 
" L561 GTATCCCTACAZVTTCGGTCCTTTGAAGGAGAAGAGATCC^ 



RRDVKHGN'FLrSRKTtTKGYL 
ATTGATTTTAAC CTTGCCATOGATTTGCACCAGAAGTACAGAAGAGCAO^TAAATCAA 
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LS2X -h-»- 1 ! i 1 k l£QQ 

S [A. 3JTO 

rn e it n ii m| d naraK:^iLH:A>a|K:s se 

16*3 1. . -h 1 1 — h k f— 1.740 

CCTCGaACTCCAGAAGGMIGGCGCTrCGTTC 

A. A. G E 5 t A. ff I E 

GCGGTAZUICCG2U3GGACC2U^CA^ 

L74X -i 1 h i 1 — ISO 0 

CGCCMTTTGGCrCCCTGG'J^ 

A. V" N*. R G t If K" P S Q iCT II A P HTS £ 2C 

JU^AGCAGCGGG&AAGACAAGAGCTCGGAA^ 

18 OX 1 1 i - -r- I860 

TTXCGTCGCCCTTTCTGTrCTCC^ 

KTAAGKTRARITDMrRWERLiTS 

CAAGGGGCAGAAGGGTCTGGCTTAACTTCAGC^ 

L861. f ! h 1 i 1520 

GTTCCCCGTCITCCCAGACCGAATTGAAGTCGATCT 

Q.GAE.GSGL.T S A K" D V T S T R ET 

CCTXCAGGTGAAAAGAGAAGAGAGCCTTTGC e^TGTCATGGSAGAAiUVGCG CT^'lTA jGAT 

X52X -r — ! h -i i 1580 

GGAAGT CCAe--L~i:-i:i: U-Luxr uiui CGGAAZVCGGTACAGTAC wTT C'rilTi 1 CGCGAAAAT CTA 

PSGEKRREBLSC'BTGRKALEiD 
TTTCTGCAkGAGACAAT GTCXGTTC CAATTCCAAACCATGAAGTAX CAT CCAAAGCTC CT 
AAAGACGTTCTCTGTTACAGACAAGGTC 

FIiQETMSVPIPITKEVSSKAP 

ACGT (J'rATGAGAAAACGGGTAGCTGLl'l! CT" ' C CAGGGAAAGCTGAGAA.GGAA C '1 " ! ' CTT 1 "AT" 

2Q4X h r- H k ~ 2X00 

TGCAGATACTCTTTTGCCCATCGACGAGAAGGTCCC^ 

TSMRKRVAAL P GICAEKELEtY 

lot M 

GTGACCCCAATGCCACTGTGCTCTAACGGTCGGC CTGAAGCAQ3GGACGTAATTGAGAAG 

210X h h h , -k 2150 

GACTGGGGTTACGGTGACACGAGATTGCCAGCCGGACI^CGTCCCCT 

loin 

Ei X P M P t* C S CTG R P E Ag|dV TEK 

ulrL 

AAAGACCXSTCCTTGCTCAGGAACCAAAGGCTTCCGAGCTCC^ 

2XSX h -t 1 h 1 ~ 2220 

TTTCTGCCAGGAAGGAGTCCTTGGTTTCCGAAG^ 

it | it. 

KDGPCSGTKGFRAPE VCFRS 
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22 2 1. 1 1 1 t 1 k 2280 

AAOSTGGTTCCIGG2^ECXAT^ 

H EC Q. H KT r D V" W S A.GTTHL.Il^n £ 

AXGGGAMSGACACCrTTCACXGGT^ 
228L i _ * ; s i r- 2340 

CGAGGCAGXGAAGAATTAXGGGAAGTAGCC^^ 

2340. h 1 -s -i — -i — 2400 

GCTCCCTCACTICTTAAIACCCTTCA^ 

AA£3SAATTA1^CGAGTCAAGGTACTTGAAG^ 
TTCCITAATATGCICAGTrCCAX^ 

l| E Ii E S R r L K" G M E H R 2C W C EL. 

AAeACAAAACGCAGAGAGxT'll ^uAGACGTAATXCCkCTATCG CT - CTT GACCTCGTTGAX 
TTGTGTTrTGCGTCI CTCAAAGAT 

ITTJCR.REFEiDVrPLSriLDriVD 
AAATGTTTGAC CGTTAACCCGAGGCGACGAATC^GC^ 
TTTACAi^CTGGCA^TTGGGCTCCGCTGCTTAGT 

KCLTViTPRRRrSAEDALICHrD 

TTCTTCCATCCAGTACATGAAACCCrTAGA^ 

2581 h + h 2640 

AAGAAGGTAGGTCATGTACTTTGGGAATCTTTGGTTT^ 

FFRPVHETEiRBIQMLLKQQPT 

GTGGTTGCTGACGC1\GTAAGCCAAACTC 

2641 h *i + — + -i 2699 

CACCAACGACTGCGTCATTCGGTTTGAGATTTGAT^^ 

VVADAVSQTLNYLQIi*- 



® 
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1 — ' , 1— I -) — H 60 

CCGCTGTAAXGTGTGTGTTCGGAGGCXC 



[atgatggagaatctactggcgaattgt^ 

CCAGT&GTAGTACT 

|m M ENIlIlA.N"CVQ IT IT li 

AACCATTTTATGTTCACCAATGCIATCTTCCTT^ 
CTGGTAAAA:CAC3UVGXGG^ 

ITHFM. F~TNA X. F I» C E E Ii D A Q F P 
TCTGAgIgTGAACCTGCAATTGTT^ 
AGACTCCACTTGGACGTIAACSVA^ 

SElVNLQLnARCYIaSljTSQAYS 



GCATATTATATC CTTAAACJ^TT CAAAAACGC CTCAGTUT CGGTATTTATTTGCAxu; U T r CA 

241 h + +■ +• h +■ 3 00 

CGTATAATATAGGAATTT C CAAGT TTTTGCGGAGTCAGAGC CATAAATAAACGTAAGAGT 

AYYXLKIGS KTPQSRYLFAFS 

fc ' TG CTT1A AGTTGGATCTTCTTGGAGAGGCTGAAGCTGCATTGTTGCCCTGTGAAGATTAT 

301 h + + h -i -+- 360 

ACGAAATTCAAC CTAGAAGAAC CTCT C CGACTTCGACGTAACAAC GGGACACTTCTAATA 

CFKLDnLGEAEAALIiPCEDY 

GCTGAAGAftlSTTCCTGGTGGTGCAGCTGGGCATTATCTTCT^ 

361 i +- + h + 420 

GGACTTCTTGAAGGACCAC CACGTC GAC CCGTAATAGAAGAAC CAGAATATATATC^TATA 

AEE|VPGGAAGHYIiLGI*IYR|y 
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TCTCGGRGGAAGRACl^ir£CA^ 

42X. + -* h h h k 480 

AGACCCrTCCITCTrGACAAGTTAI^ 

G R KT N~ C S raaFRMA-HSF^D 5 II 

Hi 

ACAACCCTTCGTATACCTCTTGAAA 

C W E A. Y G B n CS L G | A A E HAS TV" 

TTCGGGaumsrrC^TU'CCCS^ 

54X h + h h 600 

AAGCCCTTACAACGAAGGGTCGCAGAATTTTGAACACA 

EGITVASQRIIICTCVEQRrS FS 

GAAGGAGCAACCATAGACCAGATTACAGATTCTGA^ 

6Q0. 1 1 + n i +■ 660 

CXTCCTCGXUHjGTATCTGGTC^^ 

E G A T r D Q X T D S D KAIiKD TGL 

TCGCAAACAGAACACATTCCAGGAGAGAAC CAACAAGATCTGAAAATTATGCAGCAGC CT 

66X h 1 H h +■ 720 

AGCGTTTGTCTTCgrGTMU3<CTCCTCTCTTGGTTGTrCTAGA CTTTT AATACGTCGTCGGA 

SQT'EHXPGENQQDIiKIMQQP 

GGAGATATTCCACCAAATACrGAC^GGCAACI^ 

72X hi n +• + + — -h 780 

C CTCTATAAGGTGGTTTATGACTGT C CGTTGAAT CATGTTTGC CTACC CTGAACTTGTGT 

GD TP PNTD RQIiSTNGWDLUT 

<p 

C Cl " l!Cr CCAGTGCT"l w l'lACAGGTA(^TGGATGCTCC^ 

781 + + H + + + 840 

GGAAGAGGTCACGAAAATGTC CATTAC CXACGAGGTGGCGGAGACGAAGAATTCTTATAC 

P s PVLL qvImdap p plllknm 

CCTCGTCC^GCAGTGGAAGGATCTTTGATCT 

841 -k h + h h» + 900 

GCAGCAGGTCGTCACCTTCCTAGAAACTAC^GACATGTACCT 

RRPAVEGS IaMSVHGVRVRRR 
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7| 8 

AACTTTTTE&GTGAAG^ 

901 h h 1 h + K 960 

TTGAAAAAi^CACTTCrTAACACT 



ffFE~S~EBHS < AE|AQEESGILRR.S 

CGAXCTTZVXCGTCGTTCCTXTT^^ 
AR.XA.A.R.JCBCN'PMSQ.S F G IC D S H 

ACCAATGTAGAAAGTGGAAGGCTCAGTTT^ 

wnHnspsBswYA-PsikssMra- 

AAATGCAGAATCCAAAGC^GCAAAGAAGpGA 

1081 h h +■ + 1X40 

TTTACCTCTT&GGTTTCGTCGTTTCT^ 

813 

KCR-XQS S KEA.jrPDTVTnNDP 
GCAACGACGTC^GGCCAGTCTGTAAGTGACACTGGAA^ 

H41 , f- H + H 1200 

CGTTGCTGCAGTCCGGTCAGACATTCACTGTGAC CTrTCGAGACAACTACTACT CCTTTTC 
AT. TSGQSVSDTGS SVDDEEK* 

TCAAATCCTAGTGAATCTTC CCCGGATCGTTTCAGC CTTATTT CTGGAATTT CAGAAGTG 

AGTTTAGGATCACTTAGAAGGGGCCT 

SNPSESSPDRFSLISGISEV 

CTAGGCATTCTGAAAATT C1T GGAGATGGCCACAGGCATTTACATATGTACAAGTGT CAGa 

1261 + + + +• + + 1320 

GAT C CGTAAGACTTTTAAGAAC CTCTAC CGGTGT C C GTAAATGTATACATGTTCACAGT C 

LGII*KriiGDGHRHIiHMYKCQ| 
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iGAAGCTmGTXGGCATAI 
1321. ' - — « — — — — — + h h 1 +. 13 80 

crrcGAaAcaAccGismu^ 

|.a 

I E A £ EL il I Q IT L. S Q KQ YT IT T H VT V £ 
told 

mcGTccwwicrrrrrcGT3^^ 

to Ml 

M | V* G IC A F E n.QD Y F N & D S S P 

ACTCTEGCTCAXCAAAAGTATCuitj^^ 
TGAGAACG3U3TAGTTIXCA33^^ 

CTTTATCACCTGAAAGAAGAGATGA 

ISO! 1 + + h +. 1560 

GAAAIAGTGGACTTTCTTCTCXACTCCAAC^ 
U 111- 

L t| L KB B M R ti G Y 1» A Q E L .X S V 

GATCGCCTGTCTCCAGAATC CTGbrTGTGCAGT^ 

156X 4- + + + + 1620 

CTAGCGGACAGAGGTCTTAGGACCACACGTC^ 

D R £ S P E S W|CAVGNCTS LRKD 
CAIGATACTGCT <J"i: CAAAATGTTTCAGAGAGCTATC CAACTGAATGAAAGATTCACATAT 
GTACn^VTGACGAGA^xxx-jACAAAGTCT 

HDTAIiKMFQRAIQLN-BRFTY 

ni«* 

GCACATAC CCTTTGTGGCCACGApTTTGC CGCATTGGAAGAATTCGAGGATGCAGAGAGA 

1681 -k +. + + + 1740 

CGTGTATGGGAAACACCGGTGCT CAAACGGCGTAACCTT CTTAAGCTC CTACGTCTCTCT 

13 lis 

AHTLCGHE|FAALEEFEDAER 

TGCTACCGGAAGGCTCTGGGCATAGATACGAGACACTAT^ 

1740. h h + + h -h 1800 

ACGATGGCCTTC CGAGACC CGTATCTATGCTCTGTGATATTACGTACCATGC CAGAACCT 

CYRKALGTDTRHYNAWYGLG 
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ATGACCT3VTCTTCGTCAGGAGaiUVTTCGA 

1801. h h h 1 h -t- L860 

l^CTGG2m*G&AGCACTCCTCT^^ 

MXTTIiRQEKFEFAQH: Q F Q D A E 



CaAATAAAXCCAAGATCTTCAGTCAXCAT^^ 

1861. h H H H h +. 1320 

GTTTATTTAGGTTCIAGaACTCaGT^ 

AA<gkGAAACGATCAGGCGTTGATGM 
TTCTCTTTGCTACTCCGCAACTA.CTACT^ 
K I R N D E All M M M E KAV £ T D A K* IT 

CCGCTCCCCS^GTACTAC^A*^^ 

1381 r h h -h h -h 4- 2040 

GGCGAGGGGTTCATGATGTTCCGAGTGTATAATTGGT 

PnPKTYYKAHTIiTSLGDYHKA 

CAGAAAGTTTTAGAAGAGCTCAAAGAATGTGCTCCT 

2041 + H H 4- 4- -K 2100 

GTCTTTCAAAATCTTCTCGAGTTTCTTACACGAGGAGTTCTT^ 

QKVLEELKECAPQESSVHAS 

CTTGGCAAAATATACAATCAGCTAAAGCAATACGACAAAGCCCT 

2X01 h h h * n- -k 2160 

GAACCGTTTTATATGTTAGTCGATTTC 

LGKXYNQLKQYDKAVLHFGI 

GCTTTGGATTTAAGCCu'±"r C CATCTG^TGCTGTCAA.GATAAAGjGCTTACATGGAGAGG 

2161 -k + -k + + 2220 

CGAAAC CTAAATTCGGGAAGAGGTAGACTACGACAGTT CTATTTC CGAATGTAC CTCTC C 

ALDLS P S P S DAVKI K | A Y M E R 
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4 

2221 1 h -k . -k 2280 

AACT3VIGM33CTCXGCTC 

nrnBDanv^TKEiTL^ 



... •■■ ■vil 



2341 



GTC^GftACACCTGATTGGGATTTTGTTTTGA 



2401 h h -Kr h -u 2460 

TCrrTTTAACATAXCTC^CCCaAACT 



CAAAAAAAAAAAAAAAA£lZ\A^ 

2461 , h- 2512 

CTTTTTTTTT TTTTTTTTTTTT^ 
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